W3C WebRTC Working Group, 15 Mar 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: WEBRTC WG interim March 15, 2022

Description

See also meeting minutes https://www.w3.org/2022/03/15-webrtc-minutes.html

02:44 TPAC 2022
06:02 WebRTC-SVC
13:11 WebRTC-Extensions
32:50 Avoiding the “Hall of Mirrors”
53:56 Display Surface Hints
1:05:06 getViewportMedia update
1:10:38 MediaCapture Extensions proposals

A

Recovering I start in banner.

B

Okay, so this is the march 15 2022 meeting of the w3c webrtc working group, uh the group of bytes by the w3c patent policy and only companies and people that are listed on this status webpage are allowed to make substantive contributions.

B

So today we're going to cover webrtc svc, webrtc extensions, a lot it's going to talk about avoiding the whole of mirrors, we'll talk a little bit about get viewport media and then media capture, extensions prs.

B

So for the future we are going to meet also on april 19th and may 17th at the same time, and that is up on the w3c uh to webrtc working group wiki. Okay, a little bit about the meeting the slides have been published to the wiki. We do need a scribe to figure out when we make decisions and write that down. Do we have someone, volunteering.

A

I'll take care of it.

B

Okay, thank you dom. uh As you heard, the meetings being recorded and the recording will be made public a little bit about the code of conduct. We do operate under the w3c code of ethics and professional conduct, we're all passionate about improving rubber to see. But let's try to keep it cordial and professional a little bit about the meeting tips.

B

Please type plus q and minus q in the gulumi chat to get into and out of the speaker queue and if you can use headphones or an.

C

B

Canceling speaker phone uh wait for the microphone access to be granted before speaking and state your full name. So we can get you in the minutes.

B

We probably will use the pull mechanism today, so to figure out the sense of the room or just get input from people a little bit about document status, just because something's hosted in wc repo doesn't imply adoption by the working group. That requires a call for adoption as you've seen on the mailing list. Editors drafts do not represent working group consensus, but the working group drafts do once they're confirmed by the call for consensus, uh it's possible to merge prs that lack incense is if a note's attached.

B

That indicates that okay, so now uh for our first poll- and I guess we'll need a little help from harold uh and the question is: are you considering attending in person the tpac 2022 and the possible answers? Are yes? No and you have no idea.

A

Let me introduce quickly why the question is here, so you know that the past two tea packs due to the pandemic have been purely virtual. uh With some of the pandemic impacts diminishing in some region. There is discussion about having a hybrid tea park this year in vancouver week of september.

A

The idea would be that there would be a greater emphasis in enabling remote participation, but still with potential people being together in vancouver meetings having meetings there.

A

The question is whether there would be enough people to justify all the work and expenses needed to get such a meeting set up and running in september, and so I guess, we've been asked by the event organizers, whether or not we think uh people would show up and to what extent. So this is just trying to gauge how many of us feel today that they would likely come likely not come, or they are simply too much in the dark to express a useful opinion.

D

Okay, I can start to pull.

D

So the you know you now have to touch the triangle stuff down in the right hand, corner of your screen to get to to get to get to the poll and answer it.

E

Is it anonymous or just out of curiosity.

D

It's anonymous or I have no idea whether it's recorded somewhere, who that goes for what, but it's uh seems anonymous.

D

D

I can't see who voted for what.

B

Okay, 11 votes so far.

D

We're a total of 14 people so or it might be 13 if I'm counted twice.

F

A

So that's actually only 12. I guess oh.

B

I have two of me: okay, yeah, that's weird yeah,.

A

B

D

A

Probably as much information as we'll get so three, yes, four, no four don't know.

B

Yeah yeah, so there you have it it's definitive whatever. That means.

D

Yes, definitely don't know we're def. Definitely.

B

Okay, all right so for discussion today uh the agenda as much as I talked about- um and this is roughly the time allocation, so we we to try to keep to the time I guess I'll be talking. So maybe harold you'll have to ring me in if I go over okay, so uh here are the pr's and issues we're going to talk about one in weber, csvc and two and robert c extensions.

B

So in webrtc, svc issue. 68 relates to the behavior of get parameters, and specifically the text that was there was unclear about renegotiation, so it talks about before negotiation as completed and then after and the question is okay. What does this mean about renegotiation? Renegotiation, like you did an initial negotiation. Now you're renegotiating?

B

What's it? What is the situation that applies then?

B

And if you read the original text.

B

You know, it seems to say it's kind of unclear what what happens during a renegotiation um and what uh you know what values you get back.

B

So what I would like to propose here um I mean to the extent you can look at this and give any reaction is the following text, where it's clarified that we're really talking about initial negotiation um and and after the initial negotiation. So basically, what I think happens is after initial negotiation. Get parameters will always return the currently configured scalability mode for each encoding, um and that that includes so, if you do, an initial negotiation and renegotiating you're still gonna get the currently configured scalability mode, um and uh so this is.

B

This is the clarification that I I'd like to put into the spec. So basically, before the initial negotiation has completed, you get the scalability mode uh that was last set by and transceiver and set parameters. It may not be the one that you'll see after negotiation is completed, because you know the codec might not support the mode that you put in um and if you don't provide anything then uh or if it wasn't successfully set, then you don't get you don't get a value for that encoding.

B

But after initial negotiation you get the currently configured one. So does this make sense to people or do you have comments or concerns.

D

This certainly works for me.

B

Okay, so you think yeah go ahead.

G

I don't know if we're done q, but I I I'm not sure this is correct because I okay, I.

B

G

uh Get parameters uh the I think the algorithms are already very explicit about what you get and it says encodings is set to the value of send encodings, which I believe is uh set uh and local description.

G

So we have this issue that um some of the information comes from. The pending can come from the pending description and some of it comes from the current and I think the codex. I was trying to look at the issue, so I wouldn't speak incorrectly, but right now I think they're different.

G

I'm trying to find the issue where I filed that.

B

Well, yes, I think what you're talking about yanivaris? Let's say you called set codec preferences to change the preference order, like that's a a pretty uh interesting one right and now your renegotiation renegotiating. Like here's a weird example, you were using vp8 and you got you were using, for example, l1 t2. Now you renegotiate to h264, which doesn't support any scalability happens.

B

So that's kind of a very concrete example where you, after you finish the vp8. Your get parameters is returning l1 t2. um Now you call psychotic preferences at what point do things change.

B

G

Yes, even without get calling said codex preferences, I think there's a question of that. Get parameters may return different information depending on whether a negotiation renegotiation is currently happening or not, because uh the remote offer could can well wait if you have a local offer that might affect the. If you just changed something in the local offer.

F

That might affect the results.

B

Right so harold, um can we just talk about that specific case that I mentioned switching bpa for h264?

B

um So uh what do you think happens during the renegotiation before I? I get confirmation of the change to h264, I'm still going to get l1 t2 back right.

D

My my reading would be that as long as you're sending the vp8 you right, you should get l1 t2 back right and at the moment you switch to h.264.

D

You get that one t1 back.

B

Right yeah, that's what I would expect and I wouldn't yes, I mean that's what I think this text is trying to say, which is just uh you know, while you're in progress nothing's changing right. Just because I called set code at preferences, I sent an offer nothing's changing, I'm still getting the I'm still acting as though it was vp8 and l1 t2 as soon as uh so. I think. If, if what we're saying is correct, then then the text does make sense right how about after the initiation, it's always the currently configured. One.

B

And so, like you said when it's when you're actually sending h264 it switches to l1 t1 anyway, I think yanivar. If you could write up your concern in uh on uh in issue uh 68, we can kind of go through it in more detail to see if there's a problem here, okay uh issue 68 on in uh weber, csvc, okay, yeah, just uh try to trying to figure out if there's just we'll, try to go into more detail and see if there's a problem.

B

Okay, um so I think that's the resolution is uh janet- will continue the discussion in in the issue. Okay, so now we're on issue 98. This is we've now switched over to webrtc extensions, and this issue is relating to disabling hardware acceleration, so fippo provided a long list of hardware, acceleration implementation bugs that we've experienced, and it's really quite quite an incredible little list, and I won't bore you with all of the details, but basically um hardware, accelerations changes are very hard to test.

B

um They often create problems, and so the question is whether we we can provide a way to disable hardware acceleration. uh An example of this is in web codecs. Today, for example, you in the video encoder config dictionary, there's a hardware acceleration member and it can take a value of no preference, uh prefer hardware preferred software, um at least as it's been implemented. This is actually not really a hint. The spec says it's a hint, but it's not really so, if you say prefer hardware, you're gonna get harder or robust prefer software again software robust.

B

So here are some I've been thinking a little bit about this, and two approaches came to mind. One is using set parameters to set some kind of a hardware preference and the other is set code of preferences.

B

When I was thinking about set parameters, it seemed like this wasn't necessarily a great idea, because it would have the limitation that you, as we said in weber, cpc. You can't change the envelope negotiated by offer answer within set parameters, and the problem is, if you had a hardware, only codec like a higher profile of av1 or something that could only be done in hardware, and you try to disable hardware. Acceleration you'd have to throw there or same for software. If it was software only you couldn't say.

B

I prefer hardware: if you only have a software codec and then the question is: is it really necessary to be able to switch midstream? Does this even make any sense like in the middle of a call, you'll just say disable hardware acceleration.

B

um So I have some concerns about whether we, whether set parameters, makes any sense for this. The other way to go about it would be to try to do it in set coded preferences, so it would basically change your negotiation preference. You then potentially would disable some profiles like, if you said, uh prefer software, and there were a few profiles that were only in hardware. Those would disappear from create offer from the sdp, then you'd, renegotiate, and so whatever you got would be something you could actually support, and some of these complexities wouldn't happen.

B

So one specific idea is to add a hardware acceleration member to the rtc rtp coda capability dictionary and that, as I said, would influence the codec and profile combinations that would show up and create author and great answer, and if, if you set to something to prefer software, you wouldn't get any uh profiles that depended only on harvard and then the other related question is what would happen in rtc rtp coder capability? How would you discover this um and media capabilities api?

B

I guess there's issue 185, which is allowing you to retrieve the codec capability from media capabilities. uh I guess johannes can comment on on how that's going um and then the question would be. If, if we did, that, would you have a hardware acceleration member be returned, or um would you just do what we do today, which is return, smooth and powerful and then supported, and that would be enough info.

B

So any any thoughts.

D

So I have uh yandiva on the queue and dom has raised his hand.

G

Okay, I think don't beat me.

A

So I'll go further, I guess the first question is: do we really expect developers to be keeping track of this hardware, implementation bugs and so to be doing the job of saying yes or no? I want hardware acceleration, wouldn't it be more efficient if browsers were in practice doing that if they know that hardware is buggy again, I it feels hard to be saying that all developers need to keep track of all the way. Hardware can go wrong.

B

Well, this came up because this stuff is very hard to test and it's actually slipped through and disabled. uh You know large scale, implementations.

A

But that would be true of if we were pushing that to developers as well right, and it means every developer would have to find this really hard to find bugs. Rather than having.

B

Not necessarily, they wouldn't necessarily have to know what was going on, but if they were, they got imp once you started seeing indications of problems like you know, massive massive issues that users are experienced, you could just flip the switch and just turn the stuff off and then so it's kind of a think of it as a kill, switch.

H

I want to add also that sometimes decoders work, fine, unless you are paired with a specific encoder which might not be very common. So in some situations that might you might not have any problems, and you just want to disable the encoder on very specific cases, but not just say, I'm going to disable decoder for everyone, because right there.

I

H

Be someone with a broken? uh We have an encoder of a generator stream that cannot be decoded.

B

Yeah, so that's kind of the next question uh florence is uh like if you.

C

B

That case, like this particular encoder, is busted. How do I know I've got that one.

D

G

Yes, so um for set parameters, I do believe we have read only properties and we have ways to to to say uh when prop produce can be said or not. So I don't uh my concern with putting it in rtc. Rtp coded capability is that it's actually information. That's returned from get capabilities, but.

B

G

Now we would probably double the number of codec entries and some would be and that's the.

B

Acceleration field yeah, it would just be three profiles you have well. Actually I wouldn't expect the question. The other question is: would you even return this um at all and um that's a separate thing you, you might not actually return it from get capabilities.

B

uh I'm not saying that you necessarily would.

G

It doesn't seem to fit well because uh the way you've shown the the web ideal here- you're you're, adding a hardware, acceleration preference to information about a codec, and it's not the codec. That has a preference, but it's and also we just spent a large amount of time moving. Some of the fingerprinting concerns over the media session. So um and we were talked about retiring the ability to detect hardware in webrtc, so we've.

B

G

G

Yes, but we've lumped in the the sensitive information into the existing category of media sessions to get it out of the webrtc working group domain right. So now that we're reintroducing, I want to make sure we don't reintroduce uh potential concern that people might have uh again uh if, unless there's a good reason- and maybe it would be better- it doesn't seem necessary to include that information here if we could instead uh phrase it as a preference which I think is easier to do in set parameters.

D

So so we have your han assess the as the medicare build. This expert he's nationally here.

C

uh Yeah thanks, so um I guess the way I understand this is that it should just be some way for like a short term. Okay, there was a bug in the browser and then the developer could then disable the hardware encoding or decoding for some short player down until it's fixed in the browser.

C

So I think from that sense I wonder well to me it seems like it maybe doesn't have to influence that, like the melee capabilities api, what's returned from that, it's more like okay! This is something instead of having everything everything broken you get like like some kind of recovery mode or something.

C

But I I agree also that it seems a bit uh hard for two developers to use it, but I understand the concern that it it's very tricky to test it. So that's a sense. I understand that it may.

E

C

Useful still to have this api.

D

So putting myself on the queue, and so one of the things I wonder about this.

D

The places people want to do this this particular thing routing around bugs is for specific implementations of the codec, and that requires that they know the implement the implementation of the codec. Otherwise they will overcompensating and just disabling this stuff blindly.

D

D

I do wonder whether milica built this is the right right way to go, because because that's where we can actually surface information about about the implementation, without uh increasing fingerprinting.

B

D

B

Yeah, I mean that's the place where you'd find out if it's smooth power efficient and supported.

D

Yeah- and we, if you if we were also going to be told that this is the android media, this is some the samsung hardware encoder with uh with the software version 35.

D

That might be the version, the information you need to to know for certain that you're going to kill it, that you don't want to use it. I'm not sure if that fits well with the current way. You're doing mitigate build this well. How.

F

D

C

No, it seems a bit uh I'd say complicated to get in there. I asked that's another question, I'm not sure I mean this seems to be quite browser specific still. What brows? What bugs they are? I mean I probably could work well, maybe in firefox and and sony chrome that has a bag or vice versa.

C

I was just think thinking of because we have this kind of block list for certain hardware and such let's just think if, if it would be worth investigating the the possibility to just be able to like download block list on the fly, somehow that our browser, your browser,.

C

Yeah so that you quickly can get out this kind of information. Okay. Now we have a certain bargain in this. uh For this particular hardware, encoder.

C

Like without, without having to having to wait for another release,.

D

That's a good question. uh Rio.

J

Yeah uh we do here so I uh say that yes, the uh the block list which uh is used in chrome, something like that gpu driver puglits, just json.

J

Those are the ones which we use, give information to our driver teams, because uh most of this issues are driver issues, as you know, and uh our driver teams basically look at that json file inside the chrome browser and try to fix a platform by platform. So yeah.

B

Okay, um I think we're running out of time. Do we have a kind of recommended resolution other than to go to github and continue to noodle on it? There.

D

Seems to me we don't have a resolution. We have a couple of uh approaches beyond the ones that were that have already been outlined, so.

B

Okay, so I'll uh give the floor to you, harold for issue 99.

D

Yep so now I'm not watching the queue anymore. So this is about the rtc rtp header extension capabilities that we're hoping that we can get out to our shipping soon so scenario: you have an implementation, it supports snazzy extension number five. It provides dancing videos by default, not seven offers, since the browser is capable of it. It's listed and can build this, but when you create an offer, it's not there and you don't know why.

D

Now is this a problem.

D

It is a problem, then we should make the surface the information somewhere this this isn't done by default.

D

If it isn't the problem, we don't need to change anything.

D

When I started writing the slides, I kind of thought when you create an offer and do this at local description, dance and so on.

D

You can get the information out there by querying what kept what extension can build this? How I set after you after I've, set the defaults, so you can expect basically inspect the offer using the api or using some some sap extensions. So we might not need this question is: do we.

C

D

A

uh I guess is anyone asking for it.

B

Well, bernard yeah, I um I think it's a convenience uh in the in the use case. You describe right there and there will. uh There will be scenarios like this, where you don't want to set it on by default, and yet you would like to understand why.

B

So I don't know that it's absolutely required, but it does seem like it could be, could be a convenience to to the developer, because otherwise they they'd have to inspect the offer. As you say, on the slide.

G

Yes, I would say if this is for debugging, I think looking at the stp is fine, but if this is something that we imagine applications writing conditional code, for that would that would probably need an api, because it's someone going to write code that makes and some like if this doesn't show up, then I'm going to do this instead.

D

Yeah that the the most probable example would probably be that if the browser doesn't offer transport cc, then I'm going to fall back to some other question control.

D

I think you can shame this by doing doing some uh some negotiation creating an offer and dancing this and doing a dance by with a throwaway pair connection, so that you can actually query the result of negotiate negotiating it.

D

A

D

I'm lazy, I don't want to change anything.

D

So I don't hear any strong opinions, I mean vaguely foreign.

D

A

I guess my sense is that if there is a big demand for it, it sounds cheap to do if there is no demand for it. This sounds expensive to do so. Maybe it's a matter of getting the voice of developers to say how big an issue this is for now and like the idea, I think, could be brought up later. If it's not a narrow problem, and if it is, then we can iterate.

D

So if we, if we don't decide to edit now we'll ship the api without it.

A

Maybe that's the right question to ask what would be the cost if of shipping this addition later, rather than now, I could. Can we design this later as a backwards, compatible change or not.

D

The sketch I had in the bag was, it would be backwards compatible. It would just be an extra member showing up so yes, it's possible to add leisure.

C

D

So proposal such a close close, has no change and we're approximately on schedule. I think.

B

Okay, so we're going to turn the floor over to a lot.

E

Thank you. Can everybody hear me well.

B

E

Perfect, uh so I'm here to talk about the hall of mirrors, which is what happens when you capture a surface, draw it back to the surface from which you're capturing it and capture it. Recursively like that, so some of us have already tried, for example, sharing a single monitor and you're.

E

You see a preview of the monitor or a window or a tab, and specifically, I'm mostly thinking about tab, but this applies to all three of those next slide, please so, as mentioned uh you can get into that for in several ways and the problems that this creates are that it confuses the local user. It confuses the remote and remote users and it can potentially even produce mic hull, which is when you get a feedback loop on your mic and or on. You know, on the uh whatever you're capturing and nobody benefits right.

E

It usually happens by mistake, except when it's not actually a mistake, but then, uh during um during video conferencing, if you capture the current tab, that's usually a mistake. Next slide, please um just a second award before that. So uh one of the reasons that I'm focusing on tab is that if you're capturing the current win the current monitor, then that is usually not a mistake right. Usually the intention is okay, you capture the entire monitor, but then you quickly switch away from the video conference to something else and that's a bit more.

E

uh That's um a more interesting case, a bit more um a bit more delicate. So I'm only talking about when you capture the current tab.

E

So one solution that uh you could think of is you could say: okay, so how about the user agent just don't offer the current tab, and I argue that this would be a mistake, because there are legitimate applications that offer the user to capture the current tab and that is actually desirable. So, for example, if you want to take a screenshot of the current application and file feedback, uh I know there are some applications that are experimenting exactly with that.

E

um So you would want to allow the user to capture the current tab. Also, if you're recording something like, for example, if google me didn't offer recording on the cloud, one of us might have recorded this entire meeting, using uh using current tab capture and so long as we didn't actually that the person did not preview, the whatever they were capturing back to the screen.

E

There would be no hall of mirrors, so self-capture would be quite legitimate and unproblematic then, and I could come up with a couple of additional examples, but I think my point is made for now next slide, please so, given that the previous solution was not really desirable, what else could we do?

E

And uh I think that if we remember the display media gets this dictionary and we can extend the dictionary with another member, it can be include current tab. It can exclude current tab. It could be a couple of other shapes that we can discuss in soon. It can have a default volume and it doesn't. But the point is it's going to be a control that allows you to hint to the user agent, whether you want to see the current tab as one of the options or not and next slide please.

E

So then we come to the inevitable security discussion, and that is that whenever we think about influencing user selection, a lot of red uh red lights, turn on and star and ominous music starts playing, and we need to ask if there's a reasonability this time. Is that something that we should worry about this time? And I would argue that in this case this is not actually problematic, because the problem with uh influencing user selection comes in two flavors.

E

One of them is when you can influence the user towards selecting a display surface that is under your control, in which case you can embed there anything you want and that's going to be a problem.

E

I'm sorry I I won't be able to read this during the thing.

E

So the other uh problem is when you, uh when you influence the user towards something that might not be under your control, but is either inherently dangerous, like maybe it's the entire screen, or maybe it's something sensitive to begin with, like a specific tab with your bank account that might not be under your control, but you won't get that, and I argue that in this very limited case of saying I don't want to capture the current tab, you're actually moving the user away from those two cases, in which case the usual concerns we have about influencing user choice.

E

No longer apply next slide. Please!

E

So, then, assuming that we, um assuming that we agree on all of this, which is an assumption that we will soon test, uh we would have to ask the question okay, but what is the default value gonna be? And after a couple of discussions on github, I think that maybe the best thing to do is just to say: you know what there is no uh default value.

E

This is a hint each user agent will choose to interpret its the lack of it as they want, which is the current case by the way right now, there is not insane whether the user agent must include the current tab or not, and we won't be changing that uh next slide. Please and then some foreshadowing, like we won't jump directly into this.

E

But if we manage to make some progress on this right now, then maybe we can even expand the scope and say: okay, do we want something similar for excluding the current window and excluding the current uh screen, but this is probably something that we would want to potentially get to if we have time so if you could go one back, one or two slides and then I'll mute myself and listen for people to people.

G

Yeah, sorry, I'm actually not sure what we're deciding here I mean there's an issue. uh This is issue 209 that has more detail and.

G

What is being proposed.

E

Here what I'm proposing as one slide backwards? Please exactly so. I suggest that we add either include current tab or exclude current tab. It is not important to me which one it is so long as we don't have a default value and that one serves as a hint. It basically says user uh tells the user agent that the application is not interested or is interested in potentially getting the current tab.

E

G

E

G

I'm listening, okay uh yeah, so uh I um I like this api other than I think the default need should be false because I think that's the of all the sites that use uh screen capture I think most sites would rather let me back up. I don't actually think this is about um avoiding the hall of mirrors. I think that's a symptom uh that could be mitigated with other things.

G

Like uh you and mentioned, you know, having user agents uh block that the view of that or blur it or do other tricks with the video, but I think the use case is actually um that picking your own tab is often undesirable in many applications like in in this meeting here, um module the self-recording uh thing, which I think isn't the primary use case there, but I think long term.

G

We want uh self capture to be get viewport media and get display media capture of everything else, and I think most sites would be fine with that. So I would argue that it's actually desirable to um so. The question comes down to. Yes, there are some sites that that want self-capture to be part of the options, and I they should be allowed to express that. So, I think include current tab is the right boolean for that, but I I think the default is also important.

G

I think the default should be false, also there's a w3c design principle that says that booleans cannot default to true, because- and it's well explained in issue 209- that I link to the tag guidance on that that has to do with uh you. Always you don't want undefined to accidentally.

G

uh Do the opposite of what the web developer thinks they're doing. So so, if we wanted to have a default uh behavior of true, which I don't.

E

G

We would have to really agree on that.

E

We agree on the on the point of the true and that's why it could either be include or exclude current, but we need to discuss that right. um Yes, I'm sorry did you want to say something.

D

Ul is on the q2.

E

um How do we do this like? Should you n go next, or should I respond to what you never said, because there are five different points he has made and I would not be able to remember all five and then all of you ends.

D

E

Okay, uh so, first of all about default. True, thank you very much for bringing it to uh my attention and, yes, I think that we agreed there uh about un's suggestion of obfuscation, as he called it. That applies more to whole screen capture, but not for current screen capture, a current tab capture, because, if you're capturing the current tab- and you obfuscate that then you're capturing nothing. So that's a bit less interesting, uh so specifically for current tab, you are correct that hall of mirrors is not the only case. This is more of a symptom.

E

The case is that we are capturing something that you're not actually uh situated to handle right like, for example, um if you want just to work, we understand uh so, given that you like this api and that the only thing that we potentially disagree about is the default.

E

uh I would say that we should probably not take get viewport media into account when making this decision, because we have no idea how long it's going to take for it to be adopted uh by the w3c implemented and then adopted by web developers, and there is a significant risk of it being adopted by web developers, because it requires two different mechanisms which are not terribly common nowadays. So it could be that not many web developers would be able to use that api.

E

So I don't think that we should use it as a guiding principle just yet. I do think that we should avoid breaking current applications that potentially have millions of users, even if it's only for two days, I think that's undesirable, and I think that, for that reason we should just not have a default value.

E

G

um That doesn't change anything actually because it's a boolean so there's going to be a default behavior. But.

B

I'm actually going.

G

To clarify that I can, I can separate this from what implementation they're doing, because implementations excuse me, user agents are already allowed to to provide any choices. They want. It's totally up. The specification doesn't add any limitations there. What we're talking about here, I think, is a hint from the application and the shape of that hint. We can either find out which applications uh want the current tab or we can find out which applications don't want.

G

The current tab, independent of how the browsers work- and I argue that out of 100 applications, I think 99- would not want expect or have any use for capture of the current tab and one application would have used for it. So I think, would be better to have the default behavior applied to the 99 and have the one application include specify include current tab. Please true.

E

I I don't know how you go to the numbers of 99 out of 100. I think that if we spoke to one of those applications that do want it, they would give different estimations. I can tell you that I've got some histograms and self-capture happens millions of times per year. uh Presumably a significant portion of that is not accidental.

D

So we should go on with the u.n and down.

D

I

Yeah so, first about the security issues, um I would say in general that security issues related to tap capture in the current spect. The current spec is not really dealing with it. It's very light, and it's it's start it's starting to to feel I I'm starting to feel some pain there, because we uh there are some proposals that are uh more and more trying to control, to allow webpage to control what the browser will will show.

I

So what the user will pick and the more we are adding uh in terms of search control, the more we need to provide guidelines to precisely tell hey that catcher is tap. Capture is very specific, so you need to deal with it in uh in very, very good ways, and we we already need to to provide more guidelines there. So if we proceed with uh this uh to me, uh we in parallel really need to uh provide more guidelines.

I

um I think chrome already did did some guidelines already did some mitigations, so it would be good to uh to to provide such mitigations or at least uh expose what are the issues uh in general as long as this is a hint it's fine.

I

My understanding, though, is that I'm not pretty clear about it, but some implementations may see the hint and then they might remove entirely the possibility for the user to actually select the tab and if so, that's something new, because currently, as I understand it, the hints are only allowing user to to push together towards what is the more meaningful choice, but still the user, with a few clicks, can change like? Oh no, I already don't want to tap capture. I actually want window capture and that's still something feasible with the current hints with this hint.

I

It's not clear to me whether we will change this. That's at some point I want to highlight as well uh and and third related to uh hall of mirror. I don't think that this is solving whole of mirror at all. uh Clearly, it's uh it's it's a different problem.

I

uh It's, it might be fine to add the support for include current tab, but still it's a completely different uh problem and some native applications have actually implemented uh some blurring of uh uh current tab um to to solve the issue when you're sharing your screen or window and so on and for current tab.

I

uh If you have it and you're not happy with having the whole current tab, because it's doing a half mirror, you can always group to what what is meaningful in your page uh or ask the user to uh to select again, which would be really sad. um So I really think that for hall of mirror, we should keep the issue open and pursue it as a follow-up, and there might be uh some efforts or some some energy that we could provide there.

D

So I must admit that I I was listening to all that and I couldn't tell whether you were arguing for or against this particular extension.

I

um As I said, uh according the use case- and it's if it's only a hint, then that's fine in the github issue or whether at some point we would say you must remove pm3, and I really think we should not go there currently. So as long as it's certain, then what a clarification would be, uh for instance, is chrome planning to remove the entry all together if the boolean flag is true. um If that's the case, maybe we'll go in with interrupt issues at some point, because safari might not do the same.

I

So that's that's the tiny bit there where I would be uh a little cautious about this api but interviews. Why is that fine.

E

uh In order to answer the uncertainty that you've just voiced, could we go too uh to slide forward uh please so I'm suggesting the bold text at the bottom, and that is most definitely a hint, and I understand that maybe we need to rephrase it a bit. But what do you think about this general direction?.

I

um Yeah, it's uh it's different from what I understood from the github issue, but this is uh much better uh and that looks better to me.

E

So it's better, but are you supportive of this.

I

uh As long as we uh provide more security guidelines related to tap capture to the difference between current tap capture and over tap catches, and so on, uh uh I would I will be supportive. Is it possible and we keep the issue open for uh actually solving our smurfs as well.

E

So, of course, we can keep the issue because this only partially uh addresses it mitigates, so we can definitely do more work. Is it possible for you maybe to kick start the effort of elaborating security concerns that you uh worry about.

I

um I think that the chrome team and the chrome ui has done extensive work there. So it would be good if you could, uh for instance, express uh what mitigations chrome did for uh tap capture because you're the one that actually did the implementation and you're the one who are proposing to allow a webpage to more more easily select tabs, which are which have their own issues.

I

So, and you have done things like, uh for instance, if the page is navigating away, you're, making it clear in origins and so on, and we which are really great uh information and so you're way ahead of uh what safari is doing, because safari is not implementing that capture. So uh I can certainly find an issue, but I think that uh in terms of foods and how how much mitigations.

D

Have been done.

I

It should be grown that that does so.

D

We have four minutes left on this issue so done and then.

A

Yeah, so in terms of boolean default values, a simple way if this is an issue, is to move to a new value which happens to have two values, so I don't think we should get stuck on that design. Consideration I'm personally supportive of this hint as long as it is a hint really, then we are just helping the ua provide the best smoothest user experience, and so I'm supportive of that direction.

A

I agree that if we can get some of the great guidance from chrome on how to make this space easier to implement for everyone we'll get more progress overall, so I would also support events requests in that space.

E

Sure and joanna I would love to do that work. Only I'm not fully uh sure what you mean. So if you just help me later out of band just let me know specifically what you're talking about, and you know just file an issue. Tell me what specifically you're.

D

Referring to thank you, I think the conclusion of this point was that we we will continue discussing. It seems that people who are arguing about it uh want to want to go for making this a hint and not total, never show this tab, but people are generally positive towards something like this.

D

Okay next is here.

D

Display surface hunting.

E

I think intel was supposed to be in between uh okay, I see uh but meek right. I see you.

B

Alright yeah, we rearranged things a bit since, hopefully this will be short. Okay,.

C

B

Do have additional time for this. A lot don't have to do it in two minutes.

E

Okay, uh I'll try to do it in two minutes anyway. Why not uh so uh we've discussed for a long time a similar issue to the one just uh um just introduced. So let's please keep them separate, okay. So what everything we've discussed right now separate new issue. Another problem uh that sometimes happens is that the application wants to hint to the user agent that it is especially well geared to handle windows or tabs, or something like that uh and this there are all sorts of ways that we could do that.

E

But the question of there is the question of how and there's the question of: should we do it, and hopefully, by now, we've already agreed that we should and that it is helpful because it can help to actually push the user towards tabs, which are inherently safer than, for example, the window that contains the tab right because then, for example, you're not sharing the url bar and the history, and maybe the set of extensions you have and some of your bookmarks.

E

So generally, if you are gonna share a browser better to share a tab, and there were a lot of other reasons for why we might want to do that and now, after nine months of discussing the latest iteration, I hope that we could give birth to this particular proposal and because we've been unable to really completely convince each other about which way to implement this. I suggest that we compromise on the average of everything at the list change.

E

So in my mind, this is to a not introduced any new mechanisms to use the current mechanism established mechanisms, whether we like them or not, namely constraints. I think this is also good for web developers because they often file bugs on chrome, saying: hey, this constraint doesn't work, so obviously they already expect this to work. uh So that's number one and number two is: I think that we should keep this a hint and stay as open-ended about how this hint is to be interpreted by the user agent.

E

What do you think.

D

I

uh Hint is fine, I think that there's a compromise where we could say that underneath it's a constraint, we are reusing the constraint mechanism, but we change a bit of a web ideal that is exposed so that we remove all the craft uh that constraints bring in terms of web idea, and that would allow, for instance, whenever you are using exact.

I

It would just be disregarded your your you're skipping it you don't care about it, and this way you keep it constraints like everything, you're, you're, happy and so on, but still we're making things uh much simpler and uh much much more straightforward.

E

So you mean, do you mean reject if it's exact, but uh I forgot the word but preferred.

I

E

I

Okay, basically exact will not would not be defined, so it would be seen as any uh over field like exact 2 or exact z and so on. So it would be ignored and it would not be visible after you pass to web ideal and- and we can simplify things a little bit this way.

D

I would strongly strongly propose not making this that part of this proposal, but uh that's that's because I hate irregularities and to we have yaniwa, of course,.

G

uh Yes, I agree with harold on this one. uh I think uh exact is already a tight error in get display media so get display. Media is already. I think we did a good job of narrowing down the constraints mechanism. There's no advanced, there's, no exact, there's no uh min. um There is max for some reasons.

G

So since it's already operating on a reduced version of constraints, I think the issues that you mentioned aren't that serious uh severe for users and that I I so I agree with using the display surface here, because it already exists, it's already specified and if implementations haven't implemented it, uh if it's not showing up and get constraints, uh it's because some limitations haven't implemented the way they should. So I think this is a good use of that.

G

um As to the text proposed, I do believe there's some detail lost in the issue that I think I would be okay with this.

G

If we add that my concern is that applications that use this to to ask for a monitor, for instance, like uh use cases, uh schools where teachers want to see the entire screen of the desktop of the user, that that sort of that's not a use case, I feel, is something we should be providing that's a level of control uh that um is not conducive to the best interest of the user, which I in this case will be the student.

G

um So I think the uh I think we, uh I think I propose some language, that user agents should steer users away from for monitor capture would be appropriate here.

E

So uh if it is something relatively uh non-committed like should, then maybe you can talk about this, but I think that this is not actually helping, because user agents might otherwise uh so currently in a relatively big implementation, chrome, the default is um monitor and we would all like to move away from this, and an off-ramp would be to allow this to be maintained, as you know, as a hint that slowly gets deprecated if there isn't enough. Pushback.

E

We've discussed this on the issue and um I feel that also, if we just keep it minimal, we don't need to discuss, monitor. Firefox will not uh respect the hint uh for monitor and hopefully, chrome can one day stop respecting that one too, and at that point we can either add this language or just keep ignoring it, because it is implicit from the fact that it's a user agent decision, whether to regard the hint or not, and I think that would be an easy compromise right. So.

G

Well, that's why I propose should otherwise I would propose. Must yes, so I think with the should chrome should be able to uh do the off-ramp you mentioned right. I see tommy's on the queue.

A

Yeah again, this is a hint. um So what ela described as you know, you can do it or you cannot do it like firefox, is in a position to take a better approach by default. Personally, I feel that's good enough. I I don't at all object to having the shoot, but with a hint uh hinting on the hint feels also maybe uh a bit too much like.

A

I know again I if there can be consensus, I'm having the shoot great, but I think overall, we are like so close to solving an issue that is so important to so many people that I wouldn't want on us to get stuck on whether or not to include this guidance.

A

I think with a hint we give enough latitude for the right thing to happen at the end of the day, to be more specific, if chrome, which is the one that would need to fail, and he should feel that it's not helping helpful to have the should I'm not sure we gained much with having it so.

E

Yes to support what dom just said, I think that if we have a shoot that is later disregarded, uh that is, that does not serve anybody right. It just confuses web developers, so I think that we would be better off without it.

I

I don't I disagree there. um A spec is used to it by implementers. So if you're a new implementer and there will be new implementations, then people will implement it and will uh allow monitor by default and then at some point they will figure out. Oh no, then the specs should really have told me that I should have actually ignored it, and specs are also good for that, so they should will actually solve that for a new implementers. So chrome is an existing implementer. So this does not really care about the sheet.

E

Interesting, um I see the submarine in this point. Yes,.

A

So is it that you can agree on the food and agree on the overall program.

E

I think um how about the following compromise? What if we have non-normative language explaining to implementers that um a lot respecting a monitor hint comes at risk and that it's better to not do that.

D

Sounds good to me.

G

You ready, I don't see the problem with the should, though I mean this allows. Chrome chrome stated a valid reason. I think that they wanted to deal with compatibility, and this should allow that without compromising the the guidance of the spec.

I

Yeah to me should is exactly for for your keys there, where you, you want to do it, but it might be difficult. So.

E

E

Okay, um so am I hearing that uh modulus should issue everything else. We agree on.

I

um I'm not arguing on the surface, but.

E

Okay, so so am I hearing that, although we're not happy with anything, we're actually, okay with everything, modulus should.

A

Sounds like so should I record that as a resolution then.

E

The way I understand this, um please correct me if I'm wrong. If we take the current text and just add the point of should, then you would accept this.

I

I would still try to convince you on the github issue, but I guess I will fail, got a right here.

E

Okay, I heard that yeah.

E

Yeah. Never yes, awesome anybody else.

E

uh I don't have a gavel.

F

Okay, all right.

D

We have a decision.

G

Oh yes, so this is actually not an ask from the working group. It's just a reminder that uh we finally have a pr up on get viewport media. So we would like to do a call for adoption. So please have a look at that link at your convenience after the meeting. uh Just a quick recap: it's uh it looks very much like it display media except it's called get viewport media, it returns.

G

The same promise takes the same constraints and the it captures the top level browsing context, viewport, which is more commonly known as the current tab, even from an iframe and it's gated by various security mechanisms like cross-origin, isolated, a document policy viewport capture for a while, it was called html capture, but I think everyone agreed in september to call it viewport capture because then the name vehicle viewport capture is uh consistent throughout.

G

So there's a document policy, viewport capture required document policy, viewport capture for iframes, uh there's a user permission, viewport capture and a permissions policy for iframe, which is also called viewportcapture, and then this also requires transient activation and that's the same. Privacy indicator, requirements and constraints, video and audio. As get display media, so we didn't really talk about audio, but I I feel that there's really no reason to exclude audio, so it is in the current document. So um any questions about that yeah, two, okay,.

I

uh Looking at the uh web ideal display, media stream constraints is probably not what you want. You probably want your own version of viewport media and the second thing is about audio you're capturing the tab. So the question then, for audio is whether audio will be the system audio or whether it will be the current tab audio and that's something to think about.

G

Right the current document actually says it must not. uh It must not capture system level, audio.

E

G

E

It it should probably do even more. It probably should not even capture any other tab, so it's not just system, but it must be only from the current app. um I think that's the only sensible thing to do actually.

B

E

That was my intent.

D

And if and if, if it's not capable of isolating the audio for the current time, then it shouldn't capture earlier.

G

Yeah, so that is very much the intent of the document and uh I'm happy to clarify that, if maybe.

G

uh Just bear in mind for the there's uh there's a pr. If you look at the document right now, I think it does say html capture, that's uh I didn't merge issue number four, um so hopefully that'll be merged this thursday or before before call for adoption.

G

Any other questions.

E

uh This is not a question but more of a public statement. um I would like to say that I think this is uh modulo. The specifics right. The general intent of this is awesome, I'm very excited by this work and I hope to see this uh both finalized as well as implemented.

E

I just want to raise the concern that we don't know whether it's actually going to be adopted by web developers, so I think that we should stay away from avoiding other progress based on the fact that this is upcoming, and I would also like us to keep in mind that if adoption does not happen, we might need to relax a couple of the requirements uh which might be difficult, but we might need to do so or deprecate. The whole thing.

I

So related to that, do we have like? uh Did we reach out to some web developers about get report media and, in particular, the constraints related to uh cross-original isolation and whether that would not be that would not slow down the adoption of this api.

E

I actually did in the beginning of last year and they told me that they are wholly incapable of adopting this at the time at that time. I don't know if this has changed, but at that time this was a bloker for them, naming any names.

B

Yeah I've heard that as well.

G

Well, I agree that uh I think we're taking the long view here, so uh I I think we're the working group has been very open to making uh a lot of changes to get this my media lately.

G

With with that understanding.

G

Sorry to your comment, uh you went about this display media constraints. I think that's mostly editorial, but uh yes right now, it display media constraints fit the bill, uh but we can, of course, uh if there's changes needed uh for that. We can duplicate that if.

I

Don't think we need any hint about user speech selection, for instance,.

G

F

All right thanks.

B

All right, okay, we're a little bit ahead of time, uh but we're going to hand the floor over. I guess is it you uh ritchie or yeah, okay, right.

J

Okay, so yeah hi, all uh we spoke about some of the webrtc and the features at tpac, 2021 and again briefly in the november interim meeting. So now that the initial peers have been out for a while, uh hopefully this audience had the time to go through uh the details. So we have a lot of to discuss. So, let's deep dive into the face detection next slide, please yeah!

J

Well, here's a snapshot of the ideal uh for the face detection proposal. I have linked to the full proposal in the notes section. Actually, if you click there, um we want a way to do face: detection natively on browsers. Instead of you know, cloud-based solutions and the ml framework way to unlock specific, like client capabilities. Last time we demoed face detection on chrome os, and there were few remarks so number one uh bernard her harold uen wanted face detection to be anchored to video frame uh defined in web codex instead of the media stream track.

J

I think the pr now reflects that uh number two, uh the boundary box issue, I think harold asked- was to make the api a bit more general and forward-looking uh something for the future as well. uh Not getting fixated only on the present, specifically, they ask, was to return something like a mask. Instead of a rectangle, we tried to reason a bit and even though right now there is no platform api supporting a mask.

J

We try to accommodate this request by returning a contour. The number of points describing the contour can be user defined like in that phase, detection, num, contour points, um settings and implementation wise right now. Obviously, we default to a four point. Rectangle I mean discussions with camera. Driver teams across orgs actually have revealed that underlying face detection algorithms do detect those points, but the main pain point has been standardizing. The number of counts of this point, so that's why they are not yet putting up a standardized platform api.

J

uh The next task was something very similar like since a few frameworks do return face mesh like the tensorflow returns, a 468 landmark face mesh. uh Can we have that also, so we have reworked it. The face detection mode to non-present, contour and mesh again uh mesh is not possible right away on the platform, but I think uh like for the sake of extensibility.

J

We have put it up for discussion.

J

uh Number four was uh face expressions. uh It did not get much support from the audience last time. So I guess it was on everybody's mind, but tim had voiced that face expressions was more subjective and there's a concern about the expression detection going wrong. um We have removed them from the pr totally, maybe maybe in future we could add blink and smite only the other, the entire list I kept. uh Maybe it's not yet ready for prime time but as of now in the pr I have removed the entire expressions we can add later.

J

If things improve, uh number five ask was to make sure face: detection works with the transform stream. uh So we put up a example. I think it's in the next slide. uh If we could go to the next slide, yeah we put up an example uh mostly uh to show how face depiction and transform team and workers can work together and yeah. And lastly, I think the last point which I remember is, I think I said I will bring up some pnp numbers to validate some. You know claims.

J

Unfortunately, I could not get the permission for such official numbers on a public forum, but uh let's say uh unofficially, we can say that uh around it's half of the time like a half of power, is needed for phase detection or we have done only on chrome os system so uh compared it versus media pipe.

J

uh Obviously, I uh uh before chrome up streaming we'll be sharing the uh numbers proper, proper numbers, so I mean from we just quickly hacked and got the power consumption as say two wattage for this total system power, while doing the phase detection our poc, and when we did the same resolution on media pipe, it was around 3.25 to 3.5 voltage yeah, but you will bring better like official numbers soon.

J

uh I do. I continue with face detection a bit more then take the examples or how? How do you want.

B

uh Can we open it uh questions yes,.

I

Yeah, I have some some questions um in general. I think that it's good to expose it in video frame. It would also be good to expose this kind of metadata to request video frame callback metadata as well, so that you can do that with canvas as well as if you have a stream of video frames, so that that's something that I would hope is not controversial, and we could do that.

I

In your example, I see that you were using exact constraints and exact constraints are evil.

I

So uh I would think that all these constraints would uh not allow exactly actually- um and my general question would be um so that there seemed to be like several switches and a possibility to provide parameters to the camera so that they can tune their algorithms and so on, and I was wondering whether in general camera I was wondering why we need like several switches and not just have one single switch in the case where the camera is either exposing data or not exposing data.

I

But it's not like trying to disable an algorithm, re-enable, an algorithm and so on. So, are you expecting, based on these constraints, that some algorithms will kick in or will be disabled and so on so and why there should be multiple constraints, or would it be fine to just have a single switch telling hey? I want to learn about uh face detection, so please provide me whatever you're grabbing your camera sensor,.

J

Right so there is, uh uh you want. A single switch like is face. Detection supported, something like that.

I

uh It's more a general question of why there are multiple switches and whether a single switch would not be simpler and good enough in general. And then, if we have just one single switch, then it's up to the web application to read what is available and do what it can with the provided data by the sensor.

J

uh Okay, um I mean um let's say um well, for example, the contour points right now, uh there's no not such support, so it looks a bit superfluous, maybe right now, but uh just for like to accommodate the ask about future and all those things we added that way. So.

J

I mean if, if you're looking at this example and you're saying that okay, this plus supports doubt, face detection norm, it's super clueless. I guess that's what the meaning is right. um Yeah.

I

I mean I I could see. I don't really understand why face dictation, name counterpoints with with b4, for instance, it seems like your webpage and you're trying to provide some parameters to the camera sensor, but my understanding was mostly like the camera sensor is anyway doing something. It's providing.

J

I

And the idea is to expose that data.

J

Yes, that is the idea.

I

So yeah yeah, so I'm wondering why this pyramid? Why all these parameters and.

J

J

Okay, so the the issue is with the example a bit. I guess not with the.

J

So can you can we go a bit uh uh previously? Please.

J

Yeah, uh what I mean is johan, um is there like? Do you think uh this idea looks fine to you or you think this is too much information or a light too too much superfluous data.

I

I don't know enough about all the camera systems that are available right. So that's why I'm asking you yes, yeah that kind of information.

J

Right so I can say that, right now the contour mesh is not at all exposed, but we added these kept these things for extensibility, so but say, landmarks and id and probability. These three are things the driver uses ids specifically for face. Tracking uh probability is like what is the probability that it's a human face and landmarks is like left like nose eyes, so these three are totally present there, the contour and mesh, not yet there, but um they are working on it.

J

Let's say in future, so we wanted to keep it there, but uh because there was ask about future extensibility, that's why we kept it. There.

I

I think, but so maybe that that means that enums are fine, but maybe we could like uh and dictionaries are fine as well, because enums and dictionaries are extensible, but maybe we can reduce to uh what is uh implementable right now and make sure that what's coming comes next will be a will be uh like the structure we are using will be uh extensible enough for those cases.

J

Yeah, that was the point with the contour, because uh implementation wise, we would, if.

J

Number of contour points equal to four. It means a rectangle which is implementable right now, the the platform api, the the people in the camera drivers in different organizations, so they are unable to standardize the number of points. As of now, that's why they cannot expose that as a platform api, because everybody has to let's say microsoft. Google, michael everybody has to agree on okay to have a good algorithm everything we. We would standardize on 16 points, but that agreement is not yet there.

J

So so that's why there is no platform api for the contour points, but hopefully in future, it's going to work that way, but you were actually right in the thinking that this, uh the camera is anyways using uh the phase detection to improve its uh three algorithms and we are taking those things from the stream. So that way, you are right.

A

We have quite a few, so maybe we should move on okay bernard, I think you're next.

B

Right so I I just wanted to ask some kind of very high level questions about the api surface.

B

So the reason I understand this is the way it is is because what you're trying to do through the supported constraints is essentially provide uh and capabilities is, is to provide basic parameters for the algorithm which you set in the driver and then essentially, you now have video frame dot detected faces because it's it's already been done by the drivers you specified is. That is that right.

J

Yes, so the driver is actually doing the work beforehand. We need to on like implementation, wise. We need to own the face detection beforehand like uh right. uh Oh yeah,.

B

Yeah, so that's why.

J

B

Done this way, as opposed to uh having like a promise based uh method that would give you you know to which you would provide the parameters um is essentially it's. It's kind of uh it's dependent on the driver. So if your camera driver doesn't support this you're not going to have the uh you're not going to have to basically.

J

No yeah the promised thing. What, if we do through the promise thing, then we will have to call something called detect phase and then the uh let's say, implementation wise. It won't be great uh right because it's it's going to call again what the driver has already done.

C

B

C

Hey I just wanted to.

J

B

Overall, why this.

J

B

As it is, okay great, thank you.

J

Like if you want more detail on that, I can tell uh at least for the windows part. What happens is if you do through the promised way. You'll use something like face analysis, whereas if you use this way, you'll use something called mf like uh micro, mdft stuff, and uh there will be no uh like.

J

If I, if I do through phase analysis, there will be duplicate cost for compute. It cannot take advantage of implementation. Details like using lower resolution for video handling, sub uh orientation of subjects and using camera roi all these kind of things. So that's why this way.

G

All right, I think, I'm next on thecube, um so I I think uh you answered some of this uh from bernard's question and that this is sounds like it's a camera api. Not I mean it's on media stream track on video frame, but it sounds like this would only be available if the source is a camera. Is that right.

J

Yes, yes, yes,.

G

uh To make it so that's what I thought so my concern there is that there's also there's another uh effort in the wicg, which is the accelerated, shape, detection and images api. So I'm wondering how this yes, my concern is that if, if that effort uh is also ongoing, it would.

C

Be unfortunate.

G

If javascript has to write things, one way, if the source is a camera in a different way, if the source.

J

G

J

Images of course, let me say that, of course, shape. Detection is incubated in wicg and provides a way to detect faces and, of course, there is a concern about duplication of effort. So, let's start by calling out the differences uh shape. Detection obviously works only on images, not on stream.

J

uh You can call it multiple times inside transform, but I it's not an efficient way to do this. There is no face tracking available, which means uh I mean face tracking is, is a very handy way to detect across frames and make those efficient.

J

Like I said just before, under the hood, a shape detection is actually using face analysis.

J

The windows api, which, which has a total duplicate cost for compute, cannot take advantage of lower resolution, cannot automatically handle orientation and cannot use camera roi to help 3a. So these are all implementation details. I understand and.

J

So I don't know I can ask riley and miguel if they're planning to continue it because uh uh intel was our team initially put up the chrome patches for windows, and I think we realized that this is a much better like in terms of implementation.

J

This is a better way which will give at least the perf results, much better so and uh shape detections phase detector is not yet shaped because it works only on uh windows as of now and now and android, maybe not on chrome, os so and the work is stopped right, intel actually implemented. The windows part.

B

Yes, I think and harold you can maybe provide more detail, but I think it is accurate to say that wicg work isn't being implemented. It's not going ahead. Is that right.

D

Don't I really can't say yeah.

B

Chrome status didn't indicate any any right.

D

Activity there, okay.

B

Do you have any.

D

Information on this one.

A

I'm afraid I don't.

J

I mean riley and miguel were supposed to do miguel move team, so I can ask riley, I think, he's a bit busy with other stuff. So I think this and the implementation was done by my team in indel. So I think uh resource wise. That effort is a bit constrained to move forward.

D

Yeah, so I think I'm next on the queue actually go ahead, harold yeah, so I I reflect a bit on yaniva's.

D

You know that we have functions today that depend on high quality, face detection, isolation where background blur is just one of them, and we have a number of other things, things that go on and I'm worried about having these different interfaces to solve the same problem where.

C

Some of the interfaces are.

D

At the moment, proprietary and some are came to be standardized, especially if we get into a situation where the proprietary interfaces are designed to provide much higher fidelity information than what the. What the standardized interfaces or the proposed.

A

D

Interfaces can provide. That was one of the reasons why I pushed back on on making making contours and meshes.

D

Available in the api, I'm still not happy with the design that seems to be totally focused on on making this an access api for for hardware or drivers based resources. Instead of making uh this this uh representation api that allows us to say in this video frame, you have the following: information objects.

D

It's kind of getting a little bit of that flavor, but uh with the the all the dictionaries and so on, are still very much about. We have to configure the camera driver to do this work for us when it would be perfectly acceptable to do it in many cases to do it ourselves right. So I was a bit surprised about about the number that you only gained the factor of less than 50 over media over media pipe.

D

But on the shape of this, I kind of think that this is actually a major new way of treating information about.

D

About media and.

D

I'd like to see this being proposed as a proposal, a proposal not just as a set of api patches that is having the explainer having the use cases having the the apis having examples all that stuff that we want to put together before we right before. We say that this is yes. This is the right api to change to have so I'm ready.

J

Okay, so, uh uh firstly, uh the uh we don't need to configure driver anything. We put this thing's contour mesh. uh Just to you know, for extensibility, there's no driver configuration needed uh if the. If the camera is in auto mode, it automatically happens. So uh this is mainly for like getting the information and that kind of thing uh regarding examples. uh uh I guess you have had a look at the pr. uh uh Do you think those examples were okay or you like?

J

I mean I added more examples later on I mean if we could.

D

I mean I I see I saw called examples last time I looked for it, I did not see examples of what applications you would use this for. What's the problem to be solved,.

J

I mean free, I don't know, say, teams web can use this or meet if they want.

A

Just to clarify, I guess nobody is asking you to uh improvise an answer to this. The the typical approach that has been taken in this group and others is to package answers to this kind of question in an explainer which would also be needed for a tag review down the line, and I guess that's I mean what I'm sensing here is uh before we can say whether this is striking the right set of traders.

A

We would need to better understand which use cases this fulfills, which developers are asking for it and which, which of the many options that you're pushing forward, are optional or necessary, and so on. So I guess I'm hearing the need to level up some of the conversations rather than necessarily diving in into the detail api surface.

A

J

I I mean I can obviously quickly write the I have the things ready. So we can. I can come up with an explainer in github and that's it and the explainer should contain examples and use cases and anything else. I guess it should be the standard explainer right from w3c. I can.

A

Yeah and maybe we drew you and I should meet so that we can figure out again also the right.

A

Way of bringing this proposal for review by the group, one of the things that I'm also hearing is that this is big enough, a proposal that it should like probably be a spec on its own other than a patch on a patch, which is what media capture extensions is today for for better hours, uh and so again we probably need to figure the logistics for how to make that happen as smoothly as possible.

J

Okay, but uh specifically from like from this group, so you think nobody is going to use this like. Is there a question about whether it's useful or not.

I

B

Okay, well, I I think the question is whether whether it handles all the use cases, people might want to use it with.

G

Yes, um sorry to skip the cube, but maybe I can clarify I. I worry that the other specification was dealing with. um You mentioned some benefits. This is tied to video frame, which I agree has benefits over photo and images, but it still feels like tying this to the camera. I mean the camera itself provides an a necessary function. I mean there's no way to simulate the camera and stop well.

G

You need to record my face, but as far as this feature, it sounds like it's going to be obsolete in five or ten years, because machines become faster and then a lot of use. Cases I think, would be to um to do this kind of processing with hardware on sources other than the camera, and that's what I'm missing here I feel like it might be better to have a.

G

I think, a long longer view api would not focus so much on the hardware acceleration uh in the being in the driver, but that this would be the way to express uh what javascript wants on the video frame um that could be done by the user agent. For example. Right I mean.

J

Without going to the implementation detail about exactly if it's doing on camera within the isp in the cpu or vpu, those things actually are abstracted there, so, uh but uh well, okay, so so hard harold. If I am understanding correctly, you want basically proof that uh background blur face detection. All these things are going to be used at all usable or I'd like to.

D

Say to see documentation that having a square rectangle around the face is going to be useful for any I'd like to have the right that having a square rectangle around roughly, where face might be is something that uh that is usable in the real application. I.

I

Can provide some help there, because I know some use cases where it's being used today, which are encoders and colors, actually optimize with a with a rectangle which is a face.

I

If there are such meta data and the meta.

C

I

In that case is only coming from the camera, so it all makes sense. um I really think that it's a good point that you define an api that is useful for uh driver generated as well as uh algorithm, like transform stream, like you, have a transform stream that is doing phase detection and you get some metadata, and hopefully we should have the same metadata coming from the transform stream or to the driver. That's that's the point of this vc fort.

I

If we are not able to get that or we're not comfortable having that, then uh it becomes much harder to sell the the proposal.

D

Yeah so right make a document out of it. That's a point.

D

J

You can't just.

D

Make these pull requests against extensions? It's the wrong place.

J

Okay, so and explainer right, okay, yeah, explainer and all those things I have uh sort of ready in my google docs. So I just know uh need to know um the exact location where to put them I can off. uh I can coordinate with dom to find a home for that. I have the data sort of ready, okay, so oh coming.

G

If I, but also my concerns, were about the api itself and whether this is the right shape for uh okay and whether you intend to solve uh sources other than camera,.

J

um So sources other than camera, meaning um like images.

G

Right, like I mean if I do a canvas capture and turn that into video stream track, for example, or is that or if I just have video frames from an application or something like that.

F

A

Okay, oh so yeah I mean yeah just to maybe rephrase it differently.

A

If there are cases where you can get the data directly from the hardware, it's great to expose it, but that api should also be usable in situations where the data is coming from elsewhere, so that there is a unified api surface on which to to operate on and.

E

I guess that's something.

A

That was mentioned in december that uh I'm hearing is still a concern with the current approach.

J

Okay, I'll try to document that.

G

Oh, I mean, even if even if browsers uh today won't be able to emulate this as software uh they might be able to in the future. So at least if the api is in the right shape, we don't have to redefine it.

D

So we're we're able to emulate a fair bit of this, and I wasn't today so right right.

D

That's strictly strictly less performance than the browser implementation.

J

Yeah, I can okay, I I'll add uh some data to that right in the explainer. uh I I'll just move, because I see there's not much time left I'll.

J

uh The background blur, as you can see, was like one of the main concerns was we, while we were thinking was whether we needed to combine blur and replacement together.

J

As of now, we are going to propose just the background blur, uh because the replacement again not many platform apis are there. So, uh of course, there is also no platform api right now to control the blur level, but obviously there are many frameworks where you can control, so I kept it there, uh but it it will hopefully work in future, but uh yeah.

J

This was a sort of simple api, like um so platforms, do some sort of in in-stream correction and the example we present later on you'll see that the users can either opt for this one, this api or, if the, if their platform, supports or use the framework if they want to differentiate what a native look and feel like uh like, in short, suppose you're on windows the way- and there is a windows app.

J

I don't know if you are using teams and you have the teams uh chrome, running on chrome or edge uh on on windows spectrum. It will look the same the because the underlying native call is same. Obviously,.

J

Any any comments on this this one.

J

We have okay, we have an example also right.

I

You um just to mention that um on uh some platforms, uh like ios uh macros, there's the ability for the user to switch on and off background blur, for instance, and it's outside it's fully outside of the control uh of the web application and it's fully dynamic.

I

um It does not so basically, web application could not unblur right if the user decides to blur. But yes, but it.

C

I

Blur the web application could blur if the user is not wanting to blur through the os settings. That's not something that is well supported, currently by constraints because constraints, usually you have a camera and the resolution. The highest resolution will not change in real time. So maybe we need some api. If we want to use constraints there, we might need some apis so that, for instance, a web application knows that at some point background blur was changed.

I

The user decided to to use the system background blur and the web equation will not be able to uh to set it back to false, for instance, and some api is missing, there um either event based or like. Maybe just an even saying, hey constraints have changed, or I don't know, but we probably need to to find something. If we want to support uh ios platforms, for instance,.

J

Okay- and I see other.

G

Yeah, it's me again well, I think this is actually a case where constraints work really. Well, I mean you because it does allow the application to uh state it's ideal and then the user agent can still override it and that's for whether an application should be allowed to specify exactly a background blur or not that we can always discuss. I think, but I generally support this idea, that I think that.

G

That the background blur seems to be a very popular feature that it'd be nice to have some explosive native support for.

I

uh I don't think that ideal constraints uh fully support everything there, because you you can it can be set after the track is created or after apply constraints or whatever. So we we don't have good support there to express all these things and the web application might want to be notified that the change of background blur from force to even though the web application is set back, one block to force, for instance. So there are cases that we we need some api there to uh to expose those uh that's performance.

J

Okay, any any comments you have on this.

J

um uh I'm I'm just thinking whether uh people feel it's. I mean everybody is using this right. So if we can provide a platforms api, uh I guess is I mean I'm trying to find. Is there any blockers, apart from what u.n suggested.

I

Just to be just to be sure about the the background blur level, it's a double right now and I.

B

I

On ios, it's not settable, so we will.

B

I

Expose it so do you know that platforms that would take benefit of it, for instance,.

J

It's not suitable anywhere but uh on any platform api right now. But if you use frameworks, like all the ml frameworks, you can adjust a bit of a background blur. So if, if you write your own c plus plus program put it into asm, and you can always do that.

J

But uh there's no, it yeah, but uh I think that platform teams are working on going making it exposed as.

I

Settable, it's um it's something that is difficult for web applications to uh to set because they might not know the actual algorithm. So maybe they will do something and say: oh and change it a little bit or provide a user selection and so on. But it seems like advanced and advanced case. So maybe there's a two-step proposal. There. First, the simple thing and then the double double suitable uh property.

J

Okay, so right now, just the boolean. You suggest.

G

Okay, yeah, I understand that as well. I mean, I think, the way to view this is that user agents may provide blurring, allow the user to blur their camera independent of this api. The question is whether user agents feel any value and letting an application have a button to turn it on and off. I think, and I could see an implementation go either way and say: no.

G

This is the user blurred this it's not changeable by the actual application, we're not going to expose that it happened at all or we're going to expose it and not be able to change it or- and I think applications should be able to read this and and if see, if the property is there, they could experiment with turning on and off and maybe expose a button. If that was useful but yeah, it does beg the question of how much blur.

D

To situations with audio we've actually encountered.

D

Cases where it was very, very valuable to tell that the user had been been manipulating settings that were supposed to be helpful in drivers, but really messed the things up, such as causing uh double double echo, detection and and the aec controls that we're fighting each other.

D

So that was so in. In that case, the most important property we had well. The most important control we had was the ability to to turn platform effects off and the most and the the the second most uh important feature was to be able to detect that uh platform and ex effects have been turned on, so that we cannot use it to turn them off.

J

Okay, okay, I understand I hear that we can remove the double as of now start off with on and off and later on.

J

Okay, even I'll make the change. I have one minute. Can we move to the next slide.

B

Well, uh I think, actually, rather than trying to dump everything into the last minute, I think we maybe should talk about how to move forward in general because there's you know a whole bunch of stuff here we didn't get to so.

J

Yeah, the the last three were not the last three were like just putting a bull lighting correction, face framing and eye base correction. All these.

C

J

Just bull, I just want to get a feeling, because I did not get any comments from on the pr just wanted to know that uh I mean these are all these are the features you will see working on facetime every everything is working there and uh there are obviously platform it it.

J

It works on lighting correction works on meat also, these days right so yeah and what we are trying to do is giving give meat and teams and and anybody the web apps uh a way to use the platform features directly instead of running uh the framework and obviously they are free to choose both it's just giving another option and suppose meet teams fails. The blur uh is working fine on windows, they can choose the this one and if they think no, they can keep on.

J

I mean just giving options to the.

J

B

J

B

Just want to uh also figure out, I guess, from the chair's input on how what are the next steps here for the rest of this? Do we re provide time in april? uh What do you think janibar and harold.

G

Are older, you want to say something.

B

Well, how about you yeah, never think you're still on the queue.

G

Oh um yeah, I forgot why I was on the queue, but um I don't think uh we have a strong interest to implement at the moment. So um I know mozilla's position on the similarly named shape. Detection api is that we're worried about complexity, variations in support within operating systems.

G

So we have it as a defer at the moment so, but uh I I think the advice I was given uh so far was good. Tomorrow.

G

We don't have any urgency on this matter.

B

D

B

D

For uh for the face detection, we have a pretty solid.

D

Way forward is which is get this into a document and get the get the justifications and use cases written up, and then we can evaluate whether or not to adopt this. I feel like these other camera controls would better fit us uh small chapters in that document.

B

Right right, right, okay, so put it all together in one document: that's what.

J

You're saying, okay: is there a location you want location of the document or just a github? uh My personal data pick a github and I get that.

A

I'll work with you on this video again.

J

Okay, thanks john.

G

Sorry, harold, maybe maybe I misunderstood, you were saying that this these go together or they can be separate.

D

I think I think it is better to have them together. I mean these are, if, if we accept the idea of uh of con of uh using uh the constraint api or any other api on media capture to control uh camera driver settings, then having one one big one document with all the camera drivers, I think, stinks in it.

H

G

Not opposed to that, but I could also see that media capture extensions has been used for uh for uh individual constraints in the past, like for uh focal length and that kind of stuff. So there's some.

I

G

There to keep it separate.

I

To me, it seems really different, uh like one one is in terms of complexity. We are talking about uh a constraint which is boolean or not uh compared to a constraint, plus exposing complex data with potential interaction with other apis. So it's not all the same thing so, and I think that one could progress much faster than the others as well. In terms of implementations for some platforms.

J

Okay, I got confused.

D

Yes, you guys come to contradictory advice, so that's.

A

Right so we drew, I guess I will need to get uh a clear input from the chair, so the ball is in my court. I I do think there is clarity that we want an explainer for face detection. Yes, whether the three or four additional camera driver settings sure this can just.

C

A

uh Small additions to media capture, extensions or something else together that I will have to come back to you sure I have. I have all.

J

The data ready in my google docs, so just let me know whenever you are ready and I can do it.

B

Okay, well, I think we're already over time, so I think we're gonna have to bring the meeting to a close. But thank you, everybody and, as mentioned we'll be back in april.

D

See you all in april, see you bye, bye, recording now ends.