Internet Engineering Task Force 100, 15 Nov 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF100-NETVC-20171115-0930

Description

NETVC meeting session at IETF100
2017/11/15 0930

https://datatracker.ietf.org/meeting/100/proceedings/

A

B

A

C

Everybody welcome to NBC.

C

Sorry well, testing, okay,.

C

Welcome to net BC.

C

So, first off note: well should everybody should be familiar with this, but for this particular working group, please make absolutely sure you're familiar with it, because the goal this working group is to produce royalty free video codec. So we definitely need everyone to pay very close attention to the IPR rules. In the note.

A

C

Start blue sheets around I think we'll need one so I'm not gonna bother with.

C

What always takes the majority of the ten minutes of chair time? Note taker. We need a note-taker volunteer, very tourist notes. It doesn't have to be a every detail. Just major decision points.

C

C

Anyone uh on the media Co want to volunteer for taking notes.

C

In jabber room volunteer for taking notes, we're have to impose on someone now Oh Nathan did. Thank you very much. Nathan took excellent notes last time or the time before, appreciate that we have a jabber scribe volunteer, that's much easier, just relay the room.

C

Do you want to do it since we're up here anyway,.

C

And now we have our guest co-chair. Thank you very much. Man.

C

All right so quick agenda, Bash, just gonna, spend a few minutes, probably not ten, on on the status of the current documents and then Thomas is going to give us an update on the test document and Steiner will give us an update on Thor and 81 progress and comparisons followed by Tim, giving an update on dalla 81 transforms and then Luke is going to give us the latest chromo prediction from luma as it's being used in 81. Any changes people want to make to the agenda any other items to bring up.

C

If not, let's go on so quick recap of where we are on our milestones.

C

We have a milestone for the requirements document that that was a workgroup last called, but um because it's being used by some other standards body or some other industry consortium, there may be some substantive changes to it that we foresee coming up pretty soon so we're holding off on the Shepherd's right up for now and we're waiting to see if there's gonna be some substantive changes before we progress it on to the is G, hopefully, that'll conclude within with in November, so we're gonna update the milestone 2 to November I hope to get that done before the end of this month and the testing document we're gonna, update that milestone as well, because it's a not going to be concluded anytime soon.

C

We decided to keep it alive as a living document, while, while the codec candidates are progressing because we expect the test methodology is going to keep evolving. So we don't want to freeze the document pretty much early. So once we have a candidate we're comfortable with, and a testing methodology we're comfortable with as being pretty stable, then we'll we'll move it on the chain for the actual codec candidates.

C

We want a single merged codec and we lack that right now, so we do not have a candidate yet for the milestone for either the codec spec or the reference implementation. So we lack a merged candidate or merged codebase. So you see Tim and stun are presenting different tidbits, but not one consolidated, codebase or standard. So that's that's the glaring issue.

C

We need to deal with his workgroup and try to bring out the closure and Austin's going to get pushed out to July, so hopefully by then we'll have more clarity on on how to come up with a converged candidate and then finally, there's the milestone for carrying this codec inside of containers storage formats. So we'll push that out to end of 2018, because that work hasn't even started yet.

C

Any comments on the milestones or the proposed changes to them.

C

Alright, so let's go on to first item on our agenda Thomas, so you're gonna stop us on the muñeco. Yes, he is. Can you come on over to the queue.

D

All right, can you all hear me no here.

C

D

Is I'm Thomas steady from Missoula? This is the idea that VC testing craft version six.

C

Was is there something explicitly on this.

D

There's a just two toys yeah, so the testing draft has had very few changes between this version and the last version. There's really only two things, one of the small failing changes and changed. It's.

C

Almost your little faint K is there any microphone pick up that you can get.

D

Ramped up you one second,.

D

Is that better much.

C

Better thanks: okay.

D

um There's a small command-line changes and intestine arrows so next slide. So the first one is CLI parameters. We've made some small changes to the codecs that we test. One is that we removed a constraint for our leg and frames from olymic died. We previously had divorced this 225, but it turned out that was less than ideal value. It basically buffered way more frames and we technically needed to on the encoder. So we decided to drop that out of the command line. As you can see, that command line is pasted on the bottom there.

D

We just leave it up to the encoder to pick the maximum reasonable value, which I think is nineteen in the current encoder. But the general idea is with the command lines we want. As many me, you know as few possible constraint. We will not impose just a minimum constraints possible at encoder, big, the rest, so that's in the right direction. So is.

C

That supposed to be crossed through the lagging frames equals 25. Yes,.

D

That Gesser do they do that.

C

Should be removed from the command line, yep.

D

I pasted, okay I'm in there is to show that it was there, but yes delete that. That's all ideally I'd like to actually drop the auto ultra studying as well, and that should be automatic, but didn't change the code basis to do that yet.

D

So that's all second slide slide number for a dollar problem is that person we basically only allowed two very slowest encoder notes to be used for a dollar. This was the default or and for I. Think two, where we had a text file configuration and for a one that killed a CPU used, equals zero, and this became increasingly problematic, as you know, and I think in particular got much much slower over time was their current objective. One vast suite of videos.

D

That's in this testing document that take that took who was taking upwards of eight days to run the clips that became impractical for testing so obviously added an exception that allowed us to run faster test settings under limited. So those circumstances.

D

Basically, if you only compare an element to a limit- and you were just comparing a code- change that to change as a single feature- you're allowed to turn off a couple of searches and figure, the searches for exportation and exportation types that speeds up a lot and as long as you specify I did this, then it's a reasonable usage.

D

This is not certainly ideal I'm working on a better solution for this I think a better solution will actually involve custom. You know speed parameters to the encoder that are better tuned to match a real usage. The one bad thing is that we, by searching less partition, types and sizes, we kind of bias towards smaller partitions and not the rectangular partitions, which is could could affect some tools in the negative way. So we're trying to find something better. But this is a stopgap for now.

D

And that is basically the only two changes I've made to the document. I've not had to change the test set or to any of the videos at all. Those have stayed pretty much the same.

C

So this is Moe virtually from the floor mic and be lazy, since there's no queue just stay up here. The the testing methodology itself, though not the infrastructure, with the testing methodology itself, still specifies that the testing should be on the maximum compression, not not any other slower mode. Yeah.

D

So the there's basically a section at the end at a testing draft that basically says you know different things. You can test write it, so you could do like subjective tests and you can do across codec comparisons and there's like objective tests for tools and that one is the one that we allow into not used a very slow mode, basically by turning off the exportation search. Oh.

D

But if you're doing like.

E

D

Codec comparison, you obviously don't want to do that. I.

C

Mean I think I understand why it needs to be done for practical reasons in the infrastructure, but I I wouldn't think we want the spec to say this is how we test. Ideally, we didn't have an infrastructure problem. We would test for max compression right.

D

Yeah and that's certainly a issue with that doing. This is a very highly codec specific and implementation specific setting. So it's possible that which she does does not belong in the testing document and that could be a valuable feedback.

C

Okay I mean all codecs will have knobs to adjust. The compression speed, trade-off, I think the it's important for the testing document to state that the objective comparisons will be done at max compression. Otherwise it's a pretty hard. It's another dimension, another layer of curve, another dimension of curves that we have to look at to evaluate performance. I know: we've presented some of those and they're useful Steiners.

C

A lot of the speed complexity trade-offs before, but I think it's difficult to put those in the testing document as something that the candidates will be evaluated on right.

D

At the same time, so the the one for thing is I, don't want to evaluate if I specify this I don't want to evaluate the candidates in terms of you know, meeting the requirements criteria on this. This is purely for like individual tools and drafts, particularly I, do want to know, like people I think people later will probably present with CPU used settings with arts. Ro and I would like a way to normalize this, so that people people use the same CPU use settings and stuff if they need it for speed reasons, but yeah this.

D

What we could do is we could just say you know they are not specified. This in the draft. Maybe say that you know you, may you may use faster speed settings as long as you're justified it like external to the draft, basically try to be consistent. That's also a reasonable way to take it.

D

If that's, what people would prefer I can change it.

F

Jonathan Linux I mean it seems like with any codec. You know you know you can go. Your code can go to absurdly slow right, I mean this is doing. I've heard the phrase frames per day. As the speed I mean anything could do exhaustive, search every possible bit stream to see which one best matches you know in the extreme.

F

Keep your video in the extreme case, and that's just you can't say, will use whatever the invitation chooses to make the slowest possible encoding and I think there has to be some threshold of this is just absurd. We're not going to test this because it has no practical use.

D

C

Other people think.

C

I'll just know that that that absurd max compression is what's traditionally used in other other codec.

G

C

G

C

Happy to entertain more practical things in our testing draft.

D

And the other strong opinions, one away to other I, mean based on a current feedback. That sounds like I'd like to specify this, but made only for the a women testing and maybe not too explicit the exact parameters, and so, if anyone but well I'll change it basically, so it allows us to do this, but it doesn't specify two parameters and seems like the middle ground. I hear three ownage x2 that let me know otherwise I'll do that.

C

Like you have no objections, go ahead. Okay,.

D

That's what I'll do for the next draft version? That's all I have to present. So any other comments, or we can be done with this.

C

And I think we're good. Thank you.

C

All right next up.

H

Okay, hello, everybody I have to have it closed. I think I have some please what I have are some updates on on the Thor College and.

E

H

Have some updated charts on the performance and compression trade-offs? So this time the for github has actually been updated, it hasn't been updated for a while. So I guess that's about time. The main change is that I've added the support for the CDF filter through which was which has been adopted in everyone.

H

What was I hope did in everyone was the seen in past version, which I presented the last time. So this is what have implemented for Thor.

H

Another change is that I'm working on faster and simpler audio proceedeth.

H

In in authorities much faster than 81, so when I first tried to use the audio from from 81, the speed of Thor was roughly haft just for the future, but I think the audio can be improved and it's much more easy to experiment with the audio in Thor, because Thor is faster and I get the feedback much faster and whatever I do in Thor could be back ported into every one as well and last time I mentioned that I would like to get support for the dollar entropy colder in horror.

H

But there has been no progress on that. So just see that, but that's the first step towards emerged. Codec next, like this.

H

Since the last meeting there have been a few minor changes to see the this is due to a face-to-face meeting we had in the IOM group since the last time. There's a new skip block test.

H

What we had before it was that we didn't signal Anna strength for a future block. If all the coding blocks within that filter block were skipped, but the trouble with that is that you have to de call the entire filter block in order to know whether all the blocks are escaped and the Harvard people don't like that. They want to start a filtering as soon as I can turn our coding book as soon as possible, not wait and it until the end, because they don't like the Bur for anything.

H

So it was agreed to change that. So now the there's no signaling for the filter block. If the coding block size is 64 by 64, meaning that there's no petitioning of that block and the calling block is skipped that makes it possible to signal the filter strength just after the Skip flag for that block.

H

That adds a slight coding overhead, because it means that there will be more signaling and also possibly a slight complexity increased because it also leads to more blocks being fusion, but objectively, the loss is less than 0.1 percent and subjectively I don't think there is a great change.

H

It could even perhaps even gain an improvement because we get to doing more filtering next, like this, and also recently, everyone has adopted support for 128 by 128 super blocks, so that also influences how we do the Skip test.

H

Cdf still needs the signal at 64, + 64 resolution during the development of sealed F, which was merged with the dollar-driven to form seeded I tried, different block sizes and 64 by 64 was by far the best size. So we don't want to change that so for a large super block 128 by once, 22 single up to four presets in order to keep the the same field, the block size, the details haven't been what I decided. Yet how to do this?

H

We need to still need to investigate possible compression impacts, but the most simple thing would be just to expand the instead of testing for 64 over 64 blocks without partitioning and whether it's Kip or not just to check for 128 by 128 without partitioning next slide.

H

H

Running see deaf instead of C of F in for does give object against, just as it did in every one, I see gains around 1% or a half a percent. 2.2 percent I'll show the results on a separate slide shortly.

H

We see quite large gains for chroma of the 4% I'm, not quite sure why we didn't see again that high in in everyone, so I think I'll investigate that see. That does add more complexity or we will see a path, though it's it's more processing, but that's not unexpected. I also tried running CF on top of C death.

H

That gives not much gain, which is not surprising, since CDF is basically a superset of C of F, but I noted that if I greatly simplified, the CDF I do see a def just give gains, and actually it turns out that see a deaf Regan's, almost all of the loss, if I simplify the audio but having to an extra an extra pass or filtering I, think that adds a risk of over filtering and also it adds buffer requirements.

H

So it's sealed s still attractive for a fast, real-time encoder.

H

Currently, it's a good way to speed up for what I I think that it's much better I'd rather to do work on the sea. Deaf audio I think it might be hard to make it as fast as to see a deaf. Are they all, but it should be possible to come close without you to big losses. Next, like this another change, since the last meeting is in a VA one, we have three filters applied in cascades, D blocking see death and then loop restoration, and that.

H

Adds a lot of buffer requirements and again hardware people, don't like that. So there was a new proposal from arm to an Android contribution until Google and Mozilla to reduce the buffer requirements without stat there's an either a minimum of 30 lines of banned buffers. But with this new proposal it's possible to reduce that choose 16 lines next slide, please.

H

So the basic idea is to some normative changes and non normative changes and non normatively possible to do some shifting of the CDL or filtering. But the main normative change is that when loop restoration looks outside the super block that was produced by C death, it will roll. It will really deep blocked output instead, the CDF outfit and that breaks the dependency between CDF and Luke restoration.

H

So the changes that has that have been proposed requires no normative changes to see that it's mostly in loot, restoration and the changes have no impact on the other, compressed yets results, and it makes the buff line buffer requirements for everyone the same as for VP 916 lines, even though we have more filled filters in every one, and this is moving towards adoption along with new restoration, it hasn't formally be adopted yet, but it will next slide.

H

So, a bit more on the encoder complexity on see death, as I mentioned I was working on simplifying the audio and I think that can be improved even more just as a test. How far I could get I try to restrict the 50 to do no block level singling and when I do that I still get gains objective gain similar to see lbf but I think in that case the subjective gains are still much better than sealed F, because we will get the the directional part of C death.

H

So in that case the encoder will just have to select the optimal strength for the entire frame, but that's a quite small search space so and some other simplifications that I have tried, which work well is to select the damping used in the filter core based on the frame. Qp and I've also tried to decide the number of bits to use per block based on the frame, QP and sorry, based on the bitrate and friend pipe and I think that it still many ways to improve the CDF audio.

H

The the reason why I think it's important to have a good are the oval see that is that in a practical and coders Edith can probably replace some of the more complex tools, in particular in a the warm, maybe.

C

Not replace but just help yeah.

H

Well, not replacements Tannen, but if you want to do every time encoder you you probably wants, you can't use all the tools. It's simply too complex, next slide, please.

H

So these are the results that I got for adding see death in for heron, comparing just doing D blocking and with the deep locking, plus C death and in the low complexity case is now a 6.2 percent and the chrome app is not even better. If I look at the see ie de number, it's actually ten point three percent, which is I, think it's quite impressive, even in the harder the compression is, it's a six point three and in the high efficiency.

H

It's still five point, two percent and three point: one percent in the low that I and highly lay configurations. So that's that's, not bad I! Think next slide, please, and if we compare this with Z of the F, so the these are the gains that we get from replacing C of F with C death. This ee, I ee number, is 2.2 percent.

H

In the low complexity low today, configuration manager, ops, 21.1% in the high efficiency, a high delay configuration, so it's not a huge difference, but the main reason to add C, that is, to improve the actual visual quality and and in every one we did some subjective tests comparing see a path with Edith, and even though the change was less than 1% in in every one, people could still tell the difference so that probably points towards a real difference of at least five percent next slide. This mo.

C

From for Mike, again virtual for Mike, a question on the the the the gains for chroma, look like about three to four times the gains over all right, clue: I.

H

C

H

What I would like to investigate here is whether there could be an impose in the encoder. It could be probably not normative things that would have to be fixed, but it could be some kind of bug that see that is able to partially correct, but I, don't really know it's an interesting results.

H

What we could always hope it is that bug and and it will get further gains.

C

E

C

For for mark again so before you finish on CDF I want to raise one issue related to the requirements document that we expect some substantive changes in I think one of them may be related to support of four to two chroma format. Video and I. Believe CDF is one of the barriers to that, because the direction search does not support rectangular blocks. Is there any plan to address that in any way, so.

H

How it currently works is that, in the case of 42, we'd still do the filtering as normal for the luma plane, but the filtering is disabled for chroma label.

C

All filtering or disable Direction search only.

H

It's completely disabled, I think so.

H

There are ways to address that, like skipping the directional part and basically doing cell death at him as a comment yeah.

E

Terry vary from forearm. The the directional search has only ever done on luma. So what normally happens is the chroma uses the direction that luma found when it to orient its filters, and since there isn't a direct correspondence between the directions, we have a luma and the directions we have in chroma. When you squeeze the chroma blocks into a rectangle, then we disable the filter. Sorry.

C

Tim your little fan at the end. So what do you do for karma blocks? Then? Just.

E

Disable the filter, disable.

C

The directions are should totally simple: the filter completely the.

E

Whole filter completely because you can't disable the direction searches. That's only ever done on Loula. What you could do is just assume a fixed direction, which is what Steiner was saying and and just always filter with that direction, and that would be essentially like Co PF.

H

Yeah also lucious, but currently I, don't think in those solutions. Any of those solutions are going to into a v1 and it might be might be because people don't care I'm sure how important it is to have good performance for 42. Well,.

C

I mean, like I, said: I think there there may be some. There may be some changes to the requirements to make 42 more prominently, supported. Okay,.

H

Now, well, it is order, but the compression will surface lightly.

H

H

Last time, I had some graphs showing the compression and complexity trade-offs. So if you want and I have updated, those I also had some graphs showing comparison between the different codecs and I. Haven't opted those since the difference in Thor and isn't that big, it's just 1% so.

H

What I'm going to show are the compression speed relationships using our compressed yet with the objective 1 post test set and since ITF 99 there's an improvement in every one of about 5% and the encoder runs at roughly half the speed I've been using the loaded, a configuration and the BDR anchor is a the one.

A

H

Was in July last year, which is roughly equivalent to vp9 and next slide, please so.

H

Starting in July last year, at zero of the compression, the be dynamic goes down, which is good. So at the last meeting we had the VBR gain of about 20 percent, and that is now about 25 cents, and the graph has been steadily dropping with the additions of new tools in every one, and there are still some tools left not yet enabled so I expect this to drop slightly more, so we'll see next slide.

H

Please- and this is the complexity history note here, that the y axis is logarithmic and the y axis is the frames per minute, not they not seconds but minutes it's. It started at around 15 last year in July and is now round one frame a minute, so there's a change of the factor of 15 and it seems to be flattening somewhat. But again, this is a logarithmic scale on the y-axis I think this shows that the compression gains that we haven't seen in anyone don't come for free. It has a big cost.

H

So if we compare vp9 with everyone, I think currently, everyone is basically a continuation of vp9. If you plot it with difference. Complexity settings, so you have a big toolbox and, as you add, more tools to the codec, you get compression gains, but you also get that speed penalty, and the question remains whether that 2 vols box is a better tool box, not just a larger.

H

So we could replace tools in Li benign, whether simply better tools and get better performance with the same complexity, I'm, not sure, but if, if everyone gets set, opted and used as a standard, people will work on this for years and and it will probably eat at night, but we can't prove that yet.

H

Ok, this is what I had just give.

C

A quick thumb in the air of what the complexity is absolutely right now: relative history, but absolutely the vp9 roughly.

H

ah Certainly more.

G

H

100 times I guess yeah.

C

So I guess it's a crystalline so.

H

It will probably it hasn't, been a great focus to speed up everyone, so that will probably get more focused as the actual tools are finalized, but yeah right. The reference and gallery isn't that practical yeah. We can't simply I think the the specification says that we're supposed to run 4k sequences and bit, but we can't practically do that now, so nobody has been presenting the test results according to the specs, actually, because it's simply too slow and.

C

So, just to know that the those numbers were for the encoder and were roughly what the decoders yeah the.

H

Decoder speed is roughly 1/4 since July 2016, it's about 4, X's complex, is vp9 four times the complexity. I think it's closer to 16 times, but I think the main reason for that is there's still some simsim de optimizations lacking in vp9, so I think four times is a more accurate number.

H

Thank you thank.

C

E

All right stand in the pink box: I'm Tim, Terry berry I'd be presenting work on the dollar transform design. This is joint work with Nathan, eggy and Monty.

E

So, although I got this stuff started a few years ago, those two have really been doing. The bulk of the work lately so I think most of the credit of the recent developments goes to them. Next slide, um I'm going to talk a little bit about what our goals were in designing transforms for dello. One should be pretty non-controversial as we wanted an exact integer implementation.

E

It's just the way that video codecs have worked ever since 264. There's lots of iterative prediction with unstable filters, so you want an exact specified implementation so that all all decoders agree and there's no drift. um We also wanted to be able to support many different variations of the transforms so low bit. Depth hide the depth both square and rectangular, discrete cosine transforms discrete sine transforms, etc.

E

We also wanted high accuracy. So this is. This is 2017, as you just heard. We can have lots of complexity, so in times past people were afraid of having multiplies, and while we'd like as few multiplies as possible, we can have some multiplies in there if it gives us more accurate transforms.

E

That said, we want to keep software complexity as low as possible, in particular, paying attention to how things would be implemented in Cindy and at the same time we want to have reasonable hardware complexity, which means we need low latency for small transform sizes and for all these variations. We want to keep transform, reuse and embedded designs in mind, so that stuff will come I'll come along as you go through some of the slides here next slide. um So just just to start us off.

E

This is the the four point discrete cosine transform for each sixty-four. It is very low complexity, so you can implement this with 8 ads and two shifts. It has a few drawbacks. One of them is that that it is a non-uniform scale transform. So the coefficients that you get out, even though the discrete cosine is this unit very transform, where all the basis functions have the same magnitude of 1.0.

E

This gives you out coefficients that have different skills that you then have to multiply by and that usually gets absorbed into the the quantization step. So you know say: oh we're. Saving one multiply, but in reality, in in the way encoders are designed today we do rate distortion, optimization with several different possible quantization levels for all the different coefficients.

E

So you actually need to do several multiplies in there in order to get a consistent estimate of distortion that backs out this scaling factor and that those extra multiplies get multiplied by the number of different options that you search in the encoder, which is we just saw. You know this can be quite a lot.

E

One other thing is: is that when you start having larger and larger transforms, so this is only a four point transform you know you start going up to say. 64 point transforms, you actually need a very large table of constants for all these.

E

These scale factors, so we added the new goal that we wanted to have uniform scaling for all of our transforms, and you know that that will cost you four multiplies in this design, but as we go to larger and larger sizes, it turns out you can achieve this with much less than one multiply for coefficient all right next slide.

E

So this is the vp9 four-point, discrete cosine transform and I may pick on vp9 a little bit today. Just it's not because I think the vp9 design is bad, but it's actually a fairly standard textbook design for transforms, but I think we can do a little bit better and so I want to talk about some of the improvements we've made relative to vp9, just because vp9 transforms are the ones that I know the best.

E

So this is the 4-point DCT. It actually has six multiplies. They are full 32-bit products. So if you look at the bottom there we were actually taking two of these products and adding them together. So we need the full 32-bit result in order to do that, and then it additionally has eight adds two of those happen at 32 bits and then four shifts all right next slide.

E

So there are a few avenues for improvement. One is, is simplifying the multiplies. So if you looked at the 264 design like we could just scale the outputs of those that transform, then it would only cost four multiplies instead of six, but the 264 design is not a real DCT, it's only an approximation to a DCT, so it would be a little bit less accurate, but we're going to see in a bit we can actually do just as well with an accurate transform.

E

So the other approach for improving things is has to do with scaling. So the vp9 DCT adds this factor of a square root of 2 relative to a unitary transform um and in fact it turns out, as you make the transform, larger and larger each time you double the size of the transform. It adds an additional factor of of the square root of 2. So this is. This is sort of okay.

E

If you take the log of the width on the low that the height and that comes out to be even then, you can just correct the thing with a shift, but now we want to use rectangular transforms like an 8 by 4, transform or something along that, and now this scale factor becomes odd, and so we can't correct it with a shift. We actually have to correct it by doing one multiply for coefficient in order to get something that matches the same same scale as all of our quantizers all right slide.

E

So where does this scaling actually come from structurally next slide? um This is sort of the the textbook factorization of a type two discrete cosine transform. So it starts out with this stage here on the left, we're basically computing sums and differences of pairs of pixels, sometimes called plus 1, minus 1 butterflies or something to that effect. And then, after that, you can split the thing into a smaller, discrete cosine transform and a smaller discrete sine transform, alright slide.

E

So these butterflies here at the beginning, are our nonunitary, like if you you compute, equivalent basis functions for that and say what's the magnitude of the basis function, it's 1, squared plus 1 squared is, is square root of that is the square root of 2 right.

E

So that's where that factor comes from next slide and because this is recursive, there's another one inside there and as you as you, expand the transform by a factor of 2 each time you get an additional one of these, these factors of a square root of 2, and you also wind up having to do something in the discrete sine transform. That would is also expansion area like this. um If you want the scales to be uniform, all right next slide.

E

So we'd like to get rid of this extra scaling so that we don't have all these extra multiplies in our rectangular transforms. So one way we can do that is we can use multiplies and in fact, if you go back and look at vp9 s for point DCT, they actually already do this. So I, don't if you flip back to the slide 4.

E

So this step up here um is actually would be the same thing as a plus 1 minus 1 butterfly. But then it has scaled the outputs out after that, so that they match the discrete sine transform at the bottom there. So that's that's one way to correct this going, but that only got rid of it out of one stage and we're getting a set of these on at every stage, so that winds up being kind of expensive. So another approach is, we can restrict ourselves to only using shifts and ads and use asymmetric scaling.

E

I'll describe what I mean by that in the next few slides.

E

So we have. There are basically two different options: the the construct at the top there computes a sum and difference where the the output of the second component is have compared to what you would normally get and then the next one computes a sum and difference where the output of the first, the first output, is half compared to normally get, and as you see you can do this with with just by adding one shift in between the the two, the two additions or subtractions.

E

So what happens is instead of instead of doing an addition and subtraction and having both of the scales increased by a factor of square root of two. What we're actually doing is increasing one by a square root of two and decreasing the other by a factor of square root of two, so they become asymmetric. But overall you know the scaling is unity. So, like the determinant of this, this transform as a whole is still 1, and then we can cancel out this asymmetry in subsequent steps.

E

So next slide- and you do that- we use constructs like this, so the first one there computes a sum and difference where it halves the second input and then the second one does the same thing except it has the first input.

E

So these these kinds of constructs can can cancel out the asymmetry from the previous steps. So next slide.

E

We'd also like, as I, said, to simplify the multiplies, so all of these multiplies come from plane rotations between two variables, so, basically, in all of our transform factorizations we've decomposed it into a series of these plane rotations where we're taking two values, and we are rotating them by some amount. So we can. Actually, you know instead of doing that, as as a matrix multiply here, we have four multiplies in two additions. We can get rid of one multiply and instead add an addition by using a construct like that at the bottom.

E

Here, all right next slide, so we can actually also arbitrarily scale the inputs and outputs of these rotations, so just multiplying through um you can instead derive a series of steps which which looks like this, and the important thing to note is that that all of the all of the complex stuff there is basically just reduces down to a constant, and so it's it's x0 minus a constant times. X1 x1 minus a constant times p0 and then p0 minus a constant times y1.

E

So this becomes very simple and, and can let us absorb scalings in the multiplies as well all right next slide. So the advantages of doing this we get 25% fewer multiplies in general multiplies are much more expensive than additions in in software as well as hardware.

E

They all have this structure of X plus a constant times y.

E

So when we do that in fixed point, it's going to be X plus a constant times y plus a rounding offset, and then then we're going to shift off that you know shifted to the rights and only take the high part of that multiplier output.

E

So the advantage of doing this is that all of our 16 bits in D stays in 16 bits. So we don't actually need to compute full 32-bit products. We only need the top half.

E

So if we actually had to go compute a full 32-bit product, we could only do that in with half the throughput in a fixed size, Sindhi register, so ssse3 and neon actually both have instructions for doing exactly this kind of multiply. So it's a single instruction that will do the multiply. Add the rounding offset and shift the product over to the right, and so none of that has to has to expand out to a full 32 bits so that whole thing fits in in 16 bits next slide.

E

So putting all those things together, we get a transform like this, so we start off instead of having plus 1 minus 1 butterflies. We have these. These asymmetric scaling steps.

E

We use both types so that we get scaled outputs of in different directions that we then cancel in the next step, and then we have us our rotation down there at the bottom gets replaced by one of these scaled rotations using only 3 multiplies, so we have one more addition and three more multiplies than then the the h.264 transform, but we have uniform scale all right next slide.

E

The other thing, if you counted very carefully there are 3-1 halves on the board our unless on the slide, but I only say two shifts at the top, and that's because two of them are actually the same value.

E

So as you as you get larger and larger transforms, you'll also be able to share more and more of these shifts between the stages like this, and that's just because of the way that we arrange them all right next slide. um So expanding that out we can do an 8-point, DCT next slide or 16-point DCT, and that keeps going up to 64 points. I.

E

Yeah, it probably would have taken a few hours to make the drawings for that. So I didn't do that, but took a few hours to make that one for.

C

E

But the other other point to make is these: things do have embedded structure, so both the endpoint DCT and the endpoint discrete Coast are discrete sine transform, are embedded inside a discrete cosine transform that is 4 times larger, so that embedding actually skips a because of the asymmetries. So if you only go up one level, then then we're actually taking asymmetric inputs, and so it's not exactly the transform. You need basically.

C

Need two sets of.

E

G

Yeah, you need everything.

C

E

Yeah you even set in an odd set, essentially all right next slide.

E

So a few notes on accuracy. So all of these these right shifts and multiplies, introduce rounding errors. We want to keep those as small as possible, so we can the the way we go about. This is that we shift up the input by some number of bits before we do any of the transform, and then we do the full four transform quantize code, D quantize, inverse, transform and then, on the other end, when we finally get down to pixels, we ship down the output again.

E

So how much do you shift while we found diminishing returns at about four bits, and that was enough to make all of the discrete cosine transforms, match a double precision floating point implementation after rounding to the nearest pixel value. So with just a four bit up shift, we get the error down below one half of a pixel step for 8-bit input.

E

C

That open for Micah is that compared to a the same approximation or compared to a full, full DCT implemented in double floating-point. That's.

E

That's a full, you know bog standard DCT, you know not.

I

Any much straight.

E

From the formula giant matrix multiply implementation, that's good! What about what um so the error winds up being the same for higher improvements and go to the next slide.

E

I'll talk about that and something uncouple sides yeah, that's it. Basically, the the accuracy is less important for higher bit depths, because what you actually care about is accuracy relative to your quantizer and so higher bit depths use higher quantizers to get similar bit rates. So we shift up less for higher bit depths on basically 10 bits is a two bit shift in twelve bits. We have no that shifts, so it injects a little bit more noise, but it doesn't matter, but as a result, we can use the same transforms for all bit depths.

E

All right. You.

C

Know from the floor again so I think that also means you can use the same Sindhi one right for.

E

All the input- that's that's, that's correct! You can use the exact same implementation. That's nice! Alright go back yeah next slide. So how does this compare with vp9? So vp9 also shifts up the inputs, but by not as many as four bits, and then it shifts down the outputs by more than four bits and actually has to do it sometimes in between row and column transforms too and that's because they have this extra factor of a square root of two that that they grow by every every transform size.

E

So what's actually happening is, is the scale these vp9 coefficients grows as the transform progresses. So any rounding errors that you introduced early in the process get magnified as that scaling increases, whereas in dala all the stages have the same scale. So all of the rounding errors are injected at the same level and they do accumulate, but we don't magnify them all right. Next slide, that's the one we just did.

E

So another important point to talk about is the difference between scaling and dynamic range, so everything here has has orthonormal or unitary scaling right. So the magnitude of the basis functions is 1.0, but the dynamic range of the output still increases so the dynamic range here I mean the minimum or maximum output values you can actually have.

E

So all of your unitary transforms are essentially n, dimensional rotations and you can think of the input as a big n-dimensional box and the length of the diagonal of that box is going to be longer than the length of any of the edges. So as you rotate it, you can get larger values than you started with. um In fact, it's by a factor of square root of 2 every time and doubles, which you know is in addition to the scaling that vp9 does and in it they're not the same scaling.

E

So we still have this factor of square root of 2 and the size of our coefficients, but that's okay, because the l2 norm is still preserved by the transform.

E

So the question you might ask is: how big can the outputs actually be? Next slide so with a 4-bit upshift, all the transforms with 64 pixels or less fit in 16 bits. So that's a 9 bit residual 4 bit up shift and then 3 bits of dynamic range expansion which is half a bit for each of the powers of 2 and 64. So that includes your four by four or four.

E

Eight eight by four eight by eight four by sixteen and sixteen by for all of the column, transforms all the way up to 64 point also fit in sixteen bits. So that means that that 16 bits is the maximum size that you need for a hardware transpose buffer. So in between row and column stages, the hardware has to buffer the coefficients, so it can transpose them, which is a fairly significant gate, cost so being able to keep that small as nice.

E

um It also means that when you're writing Cindy, you can write a Cindy for the row transforms and it used to be simply for the column transforms and it all fits in 16 bits and for all sizes. And then you can have a separate version. Once things start going large at the 16 bits so comparing to vp9, they have larger intermediaries in the transforms, but they always shift their final coefficients down to fit in 16bits.

E

So we think this is a Miss optimization. It's it's actually just as easy to do this shift down and pack, while you're doing quantization. So we we have not tried to do this extra shift at the end. It also helps avoid double rounding and and simplifies rate distortion. Optimizations is you don't have to have any special cases for different scale factors depending on your block size, all right slide, a.

C

Quick question move from floor mic again, so you don't you don't have a fixed shift between the row and column stages. You, you accumulate a certain number of pixels total pixels. First before you do your shift um so.

E

We don't have a shift between the row and column stages at all, who.

C

Would if you have bigger than than a 64 pixel, so.

E

If you have bigger than 64 pixel, then we start going to wider Cindy once once you once the values start exceeding 60 minutes so.

C

You just go bigger than 16-bit intermediates yeah.

E

Well, I would never shift down.

C

E

Never shift down. Okay I mean you can shift down after you quantize right, because when you quantize you're gonna have the same values you would have had, regardless of what your shift was.

E

But yeah I mean that the point is you're. Gonna have to go. You're gonna have to go up to 32 bits in the transforms at some stage, um because we've eliminated this extra scaling. We do that at a later stage than vp9 does, and also because we don't do extra of shifting for high bit depth. We do it at a later stage in the vp9 dose. um So we can keep you in 16 bits longer, um but yeah I mean at some point.

E

You do have to go to go up to 32 bits and that's true both in vp9 in and in us. You know for the larger transforms.

E

Alright, so a few notes on reversibility. So when you have steps of this general form, where you take a variable- and you add to that variable sum function on all the variables except the one you're adding to um that's called a lifting step, there can be an whirring like that. That function could be arbitrary.

E

It doesn't have to be linear, like, as has no special properties at all, but because the function is not a very is not a function of the variable you're modifying it's exactly reversible right, so on on the the decoder side, you can just subtract off that function and get exactly the value that you started with.

E

What that means is we can make inverse transform by just reversing all the steps of our forward transform, and so it turns out that all of the steps that I have described so far that we use to build our transforms happen to be lifting steps all right. So so why is this good? Why would you want to do this?

E

So we really wanted reversibility in dala, because we used lapping instead of a deblocking filter, so do blocking filters. Have this nice property that they're low-pass on so they tend to blur out details over consecutive frames, um whereas on the other hand, forward and inverse lapping are matched so any any details that you have um do not get blurred out by by applying the the lapping filter. They instead just get shifted around and when you apply the opposite of the lapping filter, then they get restored.

E

So if those two are not exactly matched, then you'll build up. These rounding errors over multiple frames- and this is the same problem of you know. We essentially have an unstable filter so because we have an exact integer specification of our transforms on, you know there you would never get encode or decode or mismatch, but it would cost bits to correct these rounding errors in the encoder. So that was bad. All right next slide. um Do we actually need perfect reversibility, um so it seems to help compared to transforms that don't have it.

E

We've seen some small coding gain improvements, but it's probably not required, but we get it basically for free from from the structure of our design. We don't actually have it in dal anymore. So when you do this, 4-bit up shift and then do the transform and into the 4-bit down shift that down shift is not reversible, so that breaks it.

E

You can restore it by using twelve that references, even if you have eight put input data, basically just avoiding the down shift down to two by four bits. At the end and there's a nice blog post there by Monty that that goes through and shows you what the this error buildup looks like and what happens when you switch to twelve the references and it essentially goes away, but it turned out also that just using CL PF from Thor or the da lady ringing filter solves the problem by adding.

E

Essentially one of these low-pass filters back that that we didn't have an art, deblocking filter, um so that prevents these errors from building up right next slide. Moses.

C

From the for Mike again, so this is the implementation in dala that is not reversible because they downshift what about a v1. um So.

E

So a v1 also has the this four bit up shift in four bit down shifts, so it's also not exactly reversible. So.

C

All if you use twelve internal bit-depth, will you get the invertibility right.

E

So, and also to be clear, I'm talking about invertibility between going from coefficients to pixels back to coefficients right.

I

E

Step we needed for the laughing filter on so you still have invertibility in the sense of going from pixels to coefficients back to pixels, right yeah. So.

C

Where I was going with, it is, if you had a skipping skipping, the quantizers, then you could do lossless coding with the real transform instead of a instead of the current four point right.

E

C

So that first gains that's.

E

Actually something we tried back in vp9 with an early version of these transforms um and I think just replacing the the four point: Walsh Hadamard transform that they use with a four point. Dct was about 25%, worse in terms of the lossless bitrate, so I don't know if, if you allowed using larger transform sizes, if you you know, instead of just fixing everything down an adaptive.

C

Are do over write that the transform size you.

E

May be able to do slightly better than that, but but just doing a straight swap of four point oct.4 we.

C

Saw we saw the same thing with h.264 high profile. Somebody finally implemented high profile and switched everything to 8x8. You see loss, but if they intelligently switch between four by four and eight by eight there's, usually considerable gain. So maybe.

E

C

Adding some heuristics to optimize that may end up with a pretty good lossless codec yeah.

E

Alright Exide, so the other other nice feature of reversibility is the effect it has on dynamic range right. So, as we said, the transform coefficient values are larger than your pixel values, because your forward transform expands the dynamic range. Your inverse transform is also an n-dimensional rotation. So how do we know that it doesn't expand dynamic range right like if I have two coefficients X 0 and X 1, and they both just barely fit in 16 bits?

E

How do I know that X 0 plus X 1 won't overflow and the answer is, is because the transform is reversible, so all the values that I compute in my inverse are going to be the same, as in my forward, transform you know, plus or minus any quantization error I've introduced.

E

So this means that I'm only guaranteed to avoid overflows if the coefficients come as the result of transforming pixels. So if I decode random, garbage I might get random overflows, um but we can just define that that you know those cases aren't our undefined behavior right. We don't I, don't think anybody actually cares about the quality of decoding random garbage.

E

That's that's the same approach. 264 took so one note about discrete sine transforms. There are two types that we care about: type 4 and type 7. So for, inter predictions residuals, the the prediction error you get is asymmetric, so the error close to the edges, you're predicting from is much smaller than the error. Far away from those edges, which means you want an asymmetric transform to code them.

E

Then, if you say, ok, what's the optimal transform to use it winds up being this type 7 DST and get that by taking the a linearly increasing correlation, metrics and and taking the limit as the correlation approaches one and solving the eigen system and say what do you get the type seven DST pops out so type? Seven DST factorizations are much nastier than the type fours, which are the ones that we have embedded inside of our DCT.

E

So the type 4 is there at the top and the type 7. Is this thing down here and the real problem? Is this n plus 1/2 thing inside your trig functions, which means what this actually is is? Is a trig transform, embedded inside of a 2n, plus 1, sighs, fast, Fourier, transform and so pulling that out of there and still retaining a fast algorithm is a bit Messier since it's not a power of two. So.

E

Next slide type four transforms turned out to be almost as good in. There are already embedded inside of all of our DC T's, but our current approach is that we use type sevens for the very small ones, currently only four point: eight eight fight and then use the embedded type force for all of the larger D STS.

E

So the small ones are the factorizations, don't get that bad and it turns out to be where you can get at least some gains by using the correct DSD side.

E

So comparing overall complexity, the first three columns there are dollar TX on the next three columns, our TX mg, which is sort of the the AV 1 extension of the vp9 transforms to handle all these things like rectangular, transforms in and unify hide the depth and load that depth etc.

E

So we generally have a few more ads, but not that many more ads you can see like for the 32 point DCT it's 6.2 versus 6.0, but we have far fewer multiplies per coefficient right again for that DCT 2.7 versus 4.1.

E

So we actually can wind up with with 39 percent fewer applause. I. Think for the 32-point DST, we actually implemented the the Cindy for the eight-point DCT and directly compared that to the existing Cindy for for the Avon transforms, and it was benchmarked at 26.2 percent faster and that's mostly result of using using fewer multiplies and using cheaper multiplies right.

E

So none of our multiplies have to go up to a full 32-bit product and we don't have to do any 32-bit editions, so we're able to get higher Cindy throughput, you know, say just a small note for the discrete sine transform we're using the type 7 while txm G's is there's only using a type 4 @ for the 8 point, discrete sine transform, which is why ours is a bit more complex on particularly on the additions, and that's just a result of the type 7 factorization is is not as good alright.

E

So a few hardware considerations inter prediction requires reconstructed pixels from your neighboring blocks, so you think about it. This serializes the reconstruction of those blocks, including the inverse, transform part of that reconstruction, which is a particular problem for encoders and the decoders. You can sort of start the transforms early and it only serializes adding the residuals, but on the encoder side you need to know what pixels to transform so that that part becomes completely serial.

E

Unfortunately, when we do our 3 multiply rotations, we, those multiplies, are all changed consecutively like each one depends on the output of the previous one, which winds up being a bottleneck for small transform sizes for hardware. Alright next slide, so just for the 4-point DCT and DST we've replaced them with transforms that are not perfectly reversible and not lifting based, but we basically replace the three block. Three multiply block with a four multiplier. It's just like the matrix multiply.

E

So all the multiplies proceed in parallel, but we still only use the top half of that multiply output, so we still get full Sindhi throughput and then for the DST. We use a custom factorization that uses two five parallel multiplies.

E

So I said these are not exactly reversible, but they do solve the hardware latency problem, Exide.

E

So additional consideration, most hardware- is already multi standard and includes vp9 in all the vp9 transforms, and so they dedicate a lot of gates to having lots of parallel multiplies.

E

So we can replace a bunch of the serial multiplies in our rotations with these parallel multiplies without introducing any additional multiplies and so anything that anything that's of the form you know x0 plus a times X 1. U 0, plus B times u 0 and then y1 plus a times x1 ruies have this ABA structure for the constants. We can replace with this little more gnarly looking thing on the right, but if you reduce it down, it's 1 addition 3 multiplies that all happen in parallel and then 2 more additions.

E

So it's the same number of operations, but the multiplies can happen in parallel, so this is again no longer exactly reversible. So we're still experimenting to see what impact that has on accuracy and making sure it doesn't introduce any new potential overflows that would prevent us from from keeping our 17 16 bits.

E

So that's the everything on the the design of our transforms. I! Guess anybody have any questions. They can, through most of Mo's during the presentation, so Moses.

C

And Florida had one more final question kind of a broad one. So these look like they compare these transforms. Look quite they compare very favorably to vp9 and av1. Have you looked at Thor, which is basically a chibi? See if you look, the comparisons to the Thor transforms, so the HTPC transforms um so.

E

So we haven't done direct comparisons, at least in terms of, for example, coding performance um in terms of complexity like I.

E

They, if I understand correctly, the Thor storms are basically giant matrix, multiplies, um and- and so you know that you can get away with that for very small transforms, but as they get much larger, I think that this will wind up being significantly faster.

C

Right: okay about 15 minutes behind Luke, you can use the rest of the time if you want, because I think we technically have the room until 11:30. But if you want to speed up 11 that was 11:30. The next session starts alright, so Luke I got a five minutes. If you can I.

G

Suspect you can actually go to about 11:15, because the new chairs won't need a whole half-hour to set up so okay.

C

Thank you so so look just try to go quick on whatever you can there.

I

Can you hear me.

C

I

Alright, so you have the slides up for my presentation: can you see them? No I can't see them.

C

Almost slide one okay.

I

I can see them, though so yeah I'm gonna present an update to the CFL Draft for VC. So if we go to the first slide, chroma from luma is essentially an intra prediction tool, so it has no dependencies on other frames. It is only available to chroma planes and it basically works by predicting chroma pixels using coincident reconstructed, luma pixels. So let's go to the next slide to see the difference from what we proposed before so prior proposal was on a dowel implementation.

I

So now we've changed that to reflect what was proposed for a v1, most significant changes that we no longer rely on pvq. So prediction is now done in the spatial domain. We consider the only the AC contribution of reconstructed, pixels I'll talk about that a bit later, but that is similar to what was happening before in the pvq version of CFL. We use the existing DC pred, so DC prediction for the chroma DC contribution. This is already available in a v1 there's already fast implementations.

I

It requires no signaling and it is more precise than what is always used before. So that's also very interesting. So going on to the next slide, so the differences we can talk about, maybe Dola and Thor, which are you know, codecs that people know here. I already said before we went away from frequency and we're now going for the spatial domain. For prediction the Thor implementation is implied in the signaling.

I

The doubt implementation use the pvq gain and the sign bit to send the information we send the information explicitly using joint signs and an index value. The activation mechanism was a threshold for Thor. It was also signaled in doubt we have a special UV only mode in a v1, so anyone has separate prediction modes for intra and intra luma and intra chroma. So we take advantage of that to have this UV only mode called CFL pred.

I

We do encoder instead of doing encoder model fitting. We will do a rate constraint, search and we do know a decoder model fitting since the information in signal in the bits tree moving on to the flow of the operations. We see that if subsampling is used as though if chroma subsampling is used well, the luma surface will not be the same as the chroma surface. So we must do a luma subsampling that is equivalent to the chroma subsampling. That's being done, we subtract away the average.

I

This gives us the AC contribution in the spatial domain and this is usually on a chroma transform size block.

I

Then we'll decode, the signals scaling factors from the bit stream and we'll multiply that these are in q3 precision, but then once we multiply, that goes down to q0 and we add in the DC pred the chroma DC pred to that value, and that gives us our final prediction. So if we look at the codebook that we end up with on the next slide- oh okay, nevermind! Oh that's good! Okay! So alright! So basically, why do we go with the chroma? Dc pred?

I

Is that when we use the AC contribution, this each contribution is zero mean, which means that it sums to zero and doing so simplifies the linear regression equation for beta, which makes it the average of the Karma reference pixels and the DC pred is a very good predictor of that, as it tries to predict it using the neighboring pixels that are adjacent to the above and left borders of the block, and it requires no signaling.

I

So we don't have to signal the beta value, so alpha will be signaled, but beta won't moving on to the next slide. We have the scaling code book. So basically, this shows you when we do the search, what happens so we start in the middle of this grid and we can change the scaling factor for chroma correction with a chroma CR and chroma C B, and we move from negative to positive and you can see all the different tones that you can get. This, of course, is only a subset of the codebook we have.

I

It is goes from minus 2 to 2 in q3 so that it goes up in steps of 1 8 0 0 is not allowed, as it is DC pred. We pick our value using a rate constraint, search as I said before, since we are signalling the Alpha value the. When we do, we can't use a liner regression because that value won't be our D optimal. So what we do instead is we do the same thing as any other parameter in the encoder.

I

That requires rate is that we take the weighted rate and add that to the distortion value and pick the the parameter that minimizes that and that gets signal to do decoder. The next slide will explain how we go about signaling lists, so we will join both times. So there's gonna be 2 scaling parameters, one for CRN, one for CB, so we joined them together.

I

A sign can either be 0, negative or positive, and since 0 0 isn't allowed because that's DC pred, we have eight values which we sent to our multi symbol, encoder as an eight value symbol. Now, for each non zero scaling factor, we will send a value, excluding zero, but all the way to 2 inclusively and this again with a step of 1/8. This gives us 16 values for our multi symbol and this actually maxes out what Multi symbol, entropy coding can give us, which is a 16 value CDF going on to the next slide.

I

We can see results from our analyzer there's a link. You can click there. Sadly, it got moved behind the image. You can see the distribution of how many times modes get used. So these are UV modes that in a v1 we can see that there's about 44% of the time DC prêt will get picked, but a CFL comes in at about 17%. We observe it between 15 and 20% in different sequences for a v1, as you can see the other contender modes or best motors still slightly below.

I

So we see that actually performs other chroma modes that are available in the encoder and you can actually see this live in the analyzer in real time. Moving on to the results for subset one, we can see that there is a minus 4.65 CIE de 2010 tidge. It is the bt right, so it gives us a rate decrease with the same level of quality. We use the CIE P D value, because it is the only one that considers both luma and chroma and does so in a perceptually, uniform white.

I

If you click on the links below, you can see the full breakdown with all the values so subset, one are still images and objective one fast, our video sequences, as you can see. In that point, we are giving about an on average two point: forty one percent reduction. This is for a single tool, CFL overall of 81.

I

So that's pretty interesting. There is also psnr games. These are illumise and our gains. The reason for that is, since we have better predictions, we actually reduce the amount of bits so, and that gives us this metric actually gives better gains because it has same level quality, but it'll have fewer bits to do so, and since this is the area between a rate, difference and quality that I'll give you a negative value. So that's very good. If we move on to the next slide, we see that it is actually very good for screen content coding.

I

So here we we have, on average about 5% reduction for the screen for the gaming twitch data set, which is on slide. 11, yes, so notable mentions here are the minecraft sequence, so CFL alone gives a minus 20% reduction on both minecraft sequences that are in that test sets, and we see also good results for GTA and star graph at about 5% each first tiede 2000 and you know, still some significant gains for psnr luma.

I

Only so you know we're really impressed with the results from this tool and that's why we're proposing it as an update for the CFL proposal for a net VC I did I do that in good timing, perfect.

C

Thank you very much thanks for coming when sitting for a Tim to time, usurper any other questions for for Luke on the chroma formula tool, any other final items off the agenda all right so make sure that I get the blue sheets signed. If you came in late, anyone still here from net VC is make sure to get them. Where is the blue sheet by the way anybody needs it right? Please raise your hand we'll get it to you. Otherwise, thank you very much. I'll see you 101.

C

Thanks Matt for standing in for our co-chair.

B

B