Internet Engineering Task Force 93, 22 Jul 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF93-NETVC-20150722-1550

Description

NETVC meeting session at IETF93
2015/07/22 1550

A

B

A

I get some one near the back of the room to please close the door.

A

Thanks see it's okay, if I stayed inside also.

C

A

A

A

Guess we've already got an email, sorry, I'm gonna tell them about it to ya. No, not yet we're just starting now. Okay, it sure thank you so so, just before we start this richer's done. Alice Lee be taking some pictures of our working group today. So if anyone has any objection to that, let us know I can't imagine there would be.

A

Your net bc, I hope that's what you want to be in. Let's see note well so we kind of went over this a bit on monday, but just to make absolutely certain that everyone is aware.

A

This working group, even more than others, has some considerations even in our charter around IPR, and although I am not your lawyer, roughly speaking, this means that if you know about IP are associated with anything you're talking about you have certain obligations, disclose that to the IETF there's a form on the website for this, and you can read BCTs, 78 and 79 for more information and.

C

It in our first, our first topic we're going to have a brief intro on the IPR research and IPR declarations on that, and we just like to reiterate not enough. You.

A

Want to get up to a microphone again.

C

D

The door really.

C

We her huge the speakers weren't to us. Oh that's because is that ok, so the the first? The first presentation is going to include some IPR details at the beginning. So just want to reiterate that we are while it's fine to to talk anything about factual matters on IPR we're not going to be discussing any evaluations of IPR or any opinions on the relevance of any IP are in the workgroup. So everyone is free to form their own opinions about the validity of IPR. We won't be discussing in worker.

A

All right, and as usually we've got blue sheets to hand around make sure you get your name on this before you take off. I need a note. Taker I, don't have Martin here today, so I have to find someone else.

A

Can I impose on Brian, perhaps.

A

Thank you appreciate it, and then I would need someone scribe. I probably ought to change. That does not really ascribe. We need someone to sit in the jabber room so that, if there's any comments they can be brought to the microphone.

A

Thank you very much Matt. Oh that's true. We have the thing going on all right cool. This is the agenda. There's been one slight slight tweak to this in as much as we had 20 minutes at the end and Tim decided that if that actually stays on the schedule would be a good opportunity to talk about the results of the net bc hackathon over this past weekend. So you cannot interested in that.

A

You can go ahead and leave if you are- and we have time we're going to be going over that and that's it so Cullen.

E

So see if I can get a mic going here, um so I'm going to talk for just us two couple: quick slides here about the the Thor project and the IPR around then then after that will be someone speaking about the actual technical details, far more importantly about force. So next slide, please so the most important thing whoops, those are not the slides.

E

If you downloaded them from the browser, the ITF can't make a website that works. So you have to refresh the not.

A

E

Oh, you need to send me um so that doesn't really matter much. It's fine, the important thing for me to say the slide I was expecting is. We have made an IPR declaration on some of this, so I don't get any of the you know, crazy, Matt, starry-eyed looks or whatever, even though read the IPR declaration. Of course.

E

Obviously we intend this to be I, go towards the royalty free stuff and we'll probably updating the terms on there towards time, because we've been looking at things like the opus license like the usual Cisco, don't bug us, we won't bug you type license and we won't. We realized through the whole opus codec working group process. It was valuable to have all the people who are contributing have a similar license. So we won't plan to work with people to update the license a little bit over time, so yeah there's the slide.

E

I was thinking of thanks next slide. Now, I'm done speaking to it, we can go on back to the previous slides.

E

Yeah next slide, I guess so the way we've been looking at this work and doing it is we have a technical team that is developing that the technical aspects of the codec. Obviously this is a team. That's worked, you know. Maybe those people worked deeply in codecs for a long time and have a lot of IPR experience to understand the landscape but they're, taking the proposals and passing those over to the legal team legal team consists.

E

It is trying to evaluate this because we don't think it's really possible for us to get towards a royalty-free codec without actually understanding some of these things. um Legal team includes some external and internal people. People with strong legal backgrounds, people with strong video, codec, IPR backgrounds, worked in the space a long time and I'll talk a little bit more about in the next slide about how they deal with this, but they pass back to the technical team. Both issues like hey, we think, there's a problem here in this type of area.

E

It might have conflicts with this or that and also can pass back information about IPR that might be useful to solve the same type of problem that they found when searching that's, either old or IP. Either they've managed to license in under ways that they think, but you know be acceptable to the working group meet the sort of royalty free type terms. So next slide.

E

The approach that we're taking to the IPR evaluation is to go. I gather lots of different patents that we think are worth reviewing and looking at, and we do this from looking at a combination of looking existent patent pools. Looking at companies that are well known to have developed IPR in this type of space, just general searches for forwards on the move, and we get.

E

We gather up a big bin of stuff and um from that we need to sort this into sort of tools that we're using and applying to the codex under evaluation so that we can sort of bin them up. We figure out what our top tools are and then we evaluate against the the claims that we found against given types of tools. Look at what we have look at, how it's working and get the feedback from that I.

E

See is thing I've missed to say about that. um We, you know you could never be a hundred percent done. Gathering we've done a really strong game. It gathered lots of stuff pre-2008 less post-2008. Obviously those are equally relevant still in the gathering phase two, but we've got a big block. We've gone through a bunch of the tools we haven't gone through all the tools we've wanted, but the thing that we've been discovering, as we start evaluating these tools, is where we found problems.

E

We often found solutions as well, leading back to ways to change the idea or older ideas that had minimal or no impact on performance when we fed them back the.

E

This is ongoing process. You know we have to do this as we keep iterating the codec and trying it. We view this gone over a long period of time. Certainly the actual approach, the IPR validation, where we're happy to share the risk in that and do that, along with other companies. It's not something we'd expect a working group to do, but if other people want to do that under the the right type of agreements that's possible.

E

Certainly we don't want to create an incentive for people to come and join and create non royalty-free lifeguard we use. So that's, basically what we're doing with the IPR on this Thor codec that we're going to present the technical stuff in a bit. There's any questions on that I'm glad to answer them.

A

Okay, thank you.

F

A

So our old, thank you.

A

And the advice in the microphone has been given is be a rock star, go grab it and hold it. Thanks.

G

Okay, so I'm going to present, for which is a video codec that we have been developing within Cisco for the purpose of the work in this group next slide. So first something about the design principles.

G

We want to define a codec that has moderate complexity and can run in real time software and hardware and of course, it's also possible to extend it to non real-time purposes. For example, 2000 coding, the basic building blocks are well known, so there are no dramatic changes compared to h.264 and aged 65 on the very high level. It's the same block structure, common design, elements from other codecs larger block sizes transforms quarter pixel interpolation. We do you have some royalty-free cisco IP are in the codec.

G

You are trying to avoid non royalty-free IPR, but of course, if other companies would declare their our PR as royalty free it could. This could help improve the quality and design of this codec next slide. So this is a very high-level block, diagram or the encoder. It is exactly the same block diagram that you would find that that would apply to age, 65 and age 64.

G

So, at a very high level, the block diagrams are the same same for the decoded on the next slide, so you get the usual stuff transform coding, entropy coding in turn into frame prediction, loop filters and temporal prediction next slide. So let's go into the details. The block structure starts with what we call a superb look. That is a book by which you go through the frame in a raster scan order.

D

There's just a request that you actually stand in the box yeah. Thank you. Okay, sorry.

G

Then each superblock can't is split into smaller blocks by a quadtree structure and then again for inter brooks, you could have split into multiple prediction blocks and you can also have one or for transform brooks. So this is kind of similar to what you would find in a 65 next slide. Please yeah!

G

So assume you do this quadtree split and go down to the coding blocks that could have a variable block size and the represent and nodes and the child nodes in your quad tree structure. At that level you need to signal a mode and we currently have five modes. It's an intro.

G

We have what we call inter-cell, which is an MV index, most metal index and no residual information into one is motion, vector index and receive your information and then into two which transmits explicit motion, vector information and procedural information, and then we have a by prep mode that can do basically sense to motion vectors next slide.

G

We have eight intra prediction: modes, DC, vertical horizontal and five angular modes.

G

Next slide, inter prediction: luma is using water, pixel resolution, a six-step, separable interpolation filter, except in the center position, where we have a special separable, non-separable low-pass filter. Some of you might remember back to 2000 to where gisli proposed special filter in one of the quarter pixel positions. Some people call that the funny position- and this is not fun anymore, but it's still it's a special position and a special low pass filter and q's encoding game for chroma. It's 18 pixel resolution and four tap separable filter and we do support multiple reference frames.

H

Temporary vary from missoula arm, so for the the chroma filters. Did you choose four taps just for complexity reasons, or is there an attempt to reduce ringing there as well.

H

Tim Terry boy from Mozilla for the chrome of filters. Did you choose for tap for complexity reasons or or was there an attempt to reduce ringing there as well.

G

The chroma filters are the same as in Asian 65. We haven't done any experiments to try to change them, but we might try something in the future.

G

Next slide. Yeah transforms those are the same as in a 6 5 HDC, except that they added a 64 x, 64 transform. There is a cisco tar on the transforms. They are integer approximations to the DCT. There are all sizes from four by four up to 64 x 64 and they have what we call an embedded structure, which means that elements or the four-by-four transform matrix is a subset of the elements in the 8.8 transform, matrix and so on.

G

Next slide, please keep looking deblocking filter is actually a lot simpler than both in h.264 and aged 65. It's operates on 88 blockages. There is their own off decisions for blockage and per pixel I'm, but there is no strong filter and your two pixel impotent ill pix, an output.

G

So it's in some way it's closer to the hd2 63 filter actually, but it turns out to perform pretty well next slide. There is an additional loop filter that is applied after the deblocking filter call. It's constrained low, pass filter, it has fixed coefficients.

G

There is an on off switch for super block and the encoder applies a rate-distortion optimized decision process to turn the filter on and off. So this is kind of replacement of sao in a 65. If you want.

I

Remarkable chammak, valium missoula the constraint, low, pass, filter, you're, saying: there's an RDO decision, so there appears to be some entropy coding there. Since it's a single bit I assume it means that the entropy coding comes from jointly coding, the low-pass filter with something else. No.

G

This is actually a single bit which we send for each block super block. Okay,.

I

So it is so you mean, like. Is there a rate like you're saying it's, an RDO based decision is their rate part of this or it's just a distortion based no.

G

It's a red part, because this one bit, you multiply that, with with your lambda.

I

Okay- but it's just one bit in either case that the cost of coding it is the same whether it's 01 yeah.

G

You're, probably right so maybe it's only distortion. Thank you. Thank you for reminding me about that. Yeah.

G

Okay, next slide, entropy coding, it's at the moment. It's a VLC based.

G

There is, we are not married to that if someone want to contribute a good arithmetic coder. That would be interesting, and that is something will be the happy to consider. But at the moment we have something that is at least very simple low complexity, and it avoids a lot of idea, but we will be happy to consider arithmetic coding, also as a consequence of the VLC based approach. Some of the block level parameters need to be coded to jointly to get close to the entropy for the transform coefficient coding.

G

This is an improved version compared to what we had in version born of the hevc reference software that was removed from the software eventually, because h.265 do not support CF Elsie, but we have improved on that scheme since then. So this is what we use for transform coefficient coding right now, next slide yeah. This is on encoder optimizations. This is a non non non normative part that we build into our encoder to maximize performance.

G

So for motion estimation we use a fast searched sa debased. The intra mode selection is can be either as a debased or audio based based on the complexity operating point.

G

When we go down to the coding block, we need to choose size of the coding block and the mode that is audio-based we have. Currently. We have three operating points: high complexity, medium complexity and low complexity. That is just sensible collection of configuration files.

G

There is no problem to have finer granularity on that, but at the moment we have three operating points and we have done something. The optimization for some low level functions like transforms, motion, search motion, compensation, interpolation next slide. Yeah.

G

There are some plant extensions we are currently working on. Reordering of frames lucky be frames, it is currently in a software, but you have not do done a proper tuning yet so today, I'm not going to present any results on that, but it's in the software. But it's a bit immature.

G

We are planning to add slices and tiles and maybe other parallel processing tools next slide, so for compression performance I'm going to present some numbers first in slides on how we did the measurements so we're using be, they are PS, know based as a metric.

G

We are have so far. We have considered only HD format, so I'm presenting on the results on that we haven't optimized for lower resolutions at all, so we have a JC TVC cross, B and E Plus for internal 1080p60 sequences.

G

Next myth slide yeah, so we have been trying to compare for with x265 and vp9 using the HM reference software as anchor, so three codec compared to HMS anchor.

G

So the HM is configured with what we call low delay d configuration, which is no reordering, no look ahead and systematic QT variations, so the gob structure is fixed. It's independent, adore the content and four is using the exact same constraints for vp9 iron hand x265. That was a bit more difficult because it was not possible to configure those codecs to use the same group structure as hmm so. This is not one hundred percent apples and apples comparison.

G

This was the most reasonable I could do with a low delay constraint, so the results are on the next slide.

G

So what you can see here are three different codecs and for each sequence and each codec there is a number that is a bit rate number and it tells you how many percent extra bit to use compared to hm the anchor and if you take the average over all the sequences you get at the bottom line, and what you can see is that for uses on average for the same PSN are twenty-three percent additional bits compared to hm BP nine muses.

G

Forty percent extra bit and x265 uses twenty-four percent extra bits on average for the same PS, not.

D

Halal, the stone- and since you didn't mention vp9 before ad ad liked to have you, send me email on the configuration used on that one, it doesn't fit my numbers comparing h.265 x265 and 89.

G

You're saying the command line does not fit yeah older experiment. You did, could you say that again, I didn't.

D

Understand what you said: no, it's I'm saying that when I, when I did the comparisons between next to 65 and deepen and they were on the same order of magnitude, not that kind of difference. So please I'll contact you about that later. Yeah.

G

When I have figured out this.

D

G

I'm really I would be happy to have discussions on this after this meeting. It's no problem.

C

Mercenary from the film icon, I think I did contact a jamba bankowski's early right in him. The bar guy gave me the settings to use for for VPX encoding right the settings that could be.

G

D

Will ask the cpu used by the way it's a dollar see on the screen. I have.

G

In the table on the next page, cpus cpu used is zero. Now I have additional results where I change the complexity setting and have different results, but these were the high complexity operating modes for all the Codex okay thinking, but to go back. I have been in contact with people in google to discuss these settings and it seems like they have a preference for two powers: encoding which didn't fit the low delay constraints. So maybe that explains some of the differences.

C

So I think this kind of highlights the need for in the testing draft, to have really good understanding of how to do apples to apples comparison among these codecs and to give each of the codec teams. You know the onus of putting the best settings to make their codecs look the most flattering so that we can do. Our testing can be fairly sure that we have good results and good numbers out of it.

D

Matt Miller reeling for Randall Jessup interesting for Weber for RTC web use is the head and shoulder case. It would be interesting to hear why you think those results are happening.

G

Yes, I do have some thoughts about that and think we are very, we have considered well. We have paid a lot attention to static scenes because this sequence has a lot of static scenes so to represent large areas with as few bits as possible.

G

Next, please I have also done some very simple complexity: measurements we have. We choose to do the to it very simple, so we picked one sequence: 1 q, p value or betrayed and run timing on a single core, and when you do that, you get the results on the next slide.

G

Yeah. So I should explain this on the horizontal axis. It's a bit rate numbers from the table at least a left point on each curve. So this shows number of bits in addition to the anchor on the vertical axis, it's the frame rate for this particular sequence during single core encoding, and there is one curve for each codec and each codec has multiple operating points from high complexity to the left and the low complexity to the right.

G

And as you go along the curve, you increase the number of beats, but you also increase the speed of the encoder and, of course the goal is to achieve something that is to the upper left corner or this diagram highest possible frame rate and as close to hm as possible, maybe even better. This is where we are today. I expect that we will improve in the next few months, certainly on the bandwidth axis, but maybe even more on the vertical axis, because they only just started this work to optimize for speed.

G

So we hope that we can improve the blue curve significantly during the next few months, especially on the frame rate.

G

Next slide, yeah just to online about the source code is available in this location, and so it's free for everyone to take a look at. Thank you.

A

Thanks we have a lot of typing end of the slot, so discuss.

F

Question Shao Qi I have a question regarding your choice of using b-frames, but in the low delay mode person, I I would suppose. Typically people would try, not you know, sort of avoid the extra delay wispy frames. Can you explain if you use that? What's the extra delay penalty you introduced and would that still fitting to like? What's the low delay delay part how low latency can support yeah.

G

If I understand the question, the liberals combination of low delay and use of B frames right, you can have by prediction with prediction from two previous frames. So that would not add, if you do it like that, if not, it would not add to the delay and that's what we have done here.

G

Excuse me: yes,.

H

So Tim Theriault from Mozilla armed with regards to the the entropy coding um so you're perfectly free to steal the arithmetic coder in dolla. um It should actually be fairly easy to convert over cuz. We support raw bits which are basically what you're doing now arm. So you could basically just find your put pits interface and replace that with the call to ours and then convert symbol by symbol into using arithmetic coding, so that actually would probably be relatively straightforward to do.

G

Ok, thank you for that information and wavy line, consider diet and try to find some people to do details on.

C

Anyone is free by any experiment, doesn't have to be the prisoners as.

H

I think we would be happy to help.

D

Stevebotts go from polycom, not a question exactly just a comment. I think I think it's great that we have another candidate I, think it's a looks like a wonderful piece of work. I like the fact that you're vetting the IPR so carefully. I like the attention to medium complexity. I think this will be very, very helpful for us to be evaluating.

C

So a chair point on the multiple candidates I think we're pretty clear that the output of the working group is going to be a single codec alright. So we need to make sure that we understand all the candidates and all their best parts and figure out the right technical solutions to arrive at a single net. Vc codec at the at the end of the work groups work just.

D

To clarify, I wasn't assuming anything other than that, but I think it's good that we have more options to be looking at and.

C

The group welcomes more if anyone has codecs in their closets, please bring them out.

C

D

C

In a very large data, centers closet.

A

Okay, well, thank you very much guys I appreciate you coming talk to us. Are old, let's go ahead and get I! Think Nathan's up next yeah.

B

Alright, so my name is Nathan eggy I'm from Missoula today, I'll be talking to you about a draft that we submitted around. One of the coding tools were using in dolla next slide, so today, I'll be talking about lap transforms, which are not a new idea. They were originally proposed for still image coding by henrik melv are in 1989. The idea is to apply a pre-filter across block boundaries.

B

That's invertible that removes spatial correlation between the blocks. This brief filter has 22 benefits. The idea is to improve coding performance, but also to be something that can be applied on the decode side. To remove blocking artifacts I mean this was originally used, an audio and the idea there was that blocking artifacts end up being something that are very audible, and so they needed a technique like this. It was not widely adopted in video because of some of the problems that up with it.

B

Let's go to the next slide. Now I'll show you what this looks like so pre-filter. If you apply it to just an image, it ends up d, correlating adjacent blocks, and so your image ends up being blocky. That's what's shown at the top. If you compare what happens to taking a original image and applying just the DCT quantization, you can see you get these blocking artifacts along block boundaries. If you take the same image, apply a pre-filter, the DCT in the same quantization, and then in verse, the DCT and invert.

B

The pre-filter you'll see that those blocking artifacts are gone. These images are not meant to be a fair comparison, because no no coding was done for the quantized coefficients is just an example of the kind of artifacts that this technique removes.

A

And all point these are a lot easier to see on my screen than that one so yeah.

B

A

To grab the slides and look at them to follow along apologies for that, the projectors aren't really set up for this kind of thing. Okay, great.

B

Somebody, let me describe what how we use these in dollar on the next slide. Excuse me so yeah. The pros we have here freezing lab transforms. Are you have a larger spatial extent, because we're doing this lap transform that crosses block boundaries, be the total transform, which is the lab transform? Plus the DC has a larger support area, and so we end up getting an improved coding gain just by using the lab transforms and we didn't experiment.

B

We took data from a set of still images, it's a comparisons where we were using the KLT for four by four blocks everywhere as compared to the DCT, and you can see that the DC gets similar performance when we apply lap transforms in the KLT. We get the same kind of game. So what's really fascinating about this. Is that enough? On the 4f four blocks on smaller blocks, the benefit is almost a decibel as you go to larger blocks, that kind of falls off so at 16 by 16.

B

It wasn't as huge an improvement for larger sizes. We found that it was an even smaller improvement.

B

Another benefit is bits and the post folder is not adaptive, and so what that means is that we can take the inverse and apply it exactly as the pre-filter, and so this then takes the place of using an in loop deblocking filter within the coding pipeline.

B

The cons are, of course, that, because we have the additional support area, this increases ringing on edges and it has the other major drawback that many of the coding tools we used are now no longer applicable so before when you would use neighboring pixels to do intra prediction.

B

Those neighboring pixels are no longer available to the decoder and so being able to predict a block from its spatial surroundings, doesn't work, and so in da we had to come up with other techniques for that that do not have the same benefit directly as doing spatial prediction, alright next line. So what we have some support for in dollar currently is. We have support for the following block sizes, four by four up to 64, x 64, which is which is in progress.

B

Lapping is done basically on the loo a plane by applying the four point and eight point filters.

B

Yes, so on a little thing we we can use for by 44.8 point filters. We apply bait point filters across larger blocks and four point filters across the four by four blocks when we split an eight by eight block down to a format, for we then apply a four-point filter on the interior, edges and I'll show a demonstration of that shortly for cuma for chroma in 444.

B

We do exactly the same thing as you do in luma when it's 420 it'll use a four-point filter everywhere, and this is so that the filters along block boundaries have the same spatial extent, because we do things like comer from luma and other prediction. And then the important thing knows that the lapping sighs does not depend on the the neighbors block size. So, as you recurse through your your block size decision, the deburring blocks. Changing your new burrs split decision doesn't impact the lapping along edges, and this allows us have been extensively.

B

Sir exhaustively search all possible combinations of block partitions without the lapping interfering with neighboring decisions.

B

But actually so specifically, when we do, the exhaustive search will actually split all the way down to four by fours and then we'll can compare a four by four or a set of four by four blocks to the eight by eight block they represent and then take the best of those choices and as we move up through the hierarchy will compare merging all blocks together versus the best of the split blocks below that all right for the next slide, please.

B

So this is the order. We apply filters, starting a super block level and in dala we have 32 by 32. Superblocks currently will be moving to 64 x, 64 superblocks you'll apply the 8-point filter across the top and bottom edges and next slide. You'll then apply it across the left and right edges and then X on then you'll apply it across the horizontal edge and in the vertical edge, and now you can recurse and do this the same technique for all the interior edges of your blocks below that.

B

So if you've split to eight by eight from the 16 by 16, say you'll continue doing the filter in the same fashion on the interior edges all right, good excellent. So the lab transforms that we haven't all have some very interesting properties, so we designed them to be completely reversible.

B

That is that for every any input x, applying the forward lap transform followed by the inverse lap transform, gives us the same value of x, and this is important because we want to make sure that when you do, you have similar content frame by frame, you would like to have the same values come out of your inverse transform so that there's no rounding error. That's accumulated auction items are by orthogonal, which means that the lab transforms infer to introduce some scale factors.

B

These scale factors have annual, have some correlation between the coefficients, we're not exploring that correlation. There's a dynamic range expansion using his lab transforms to the core DCT we designed as an orthonormal transform, which means there's no there's a minimal range expansion on that for the pre and post filters. They add about one or two bits, depending on the scale factors, because the the DCT is done with an integer approximation for very small inputs, there's a lot of rounding.

B

So we get around this by scaling all of our inputs up by four bits and then, after the transform me on the inverse side, once we've done, the inverse transform will scale them back down. This has the effect that four blocks larger than sixteen by sixteen. We can no longer fit coefficients in 16 bits and they'll have an impact on simdi. Perhaps when we get to doing that, optimization next slide and that's it.

C

Mosin a diez individual from floor mike. So can you describe what what was the main motivation for for exploring this? Was it was the thought of the blocking IPR or was it something else like you know, expected coding gain even at the higher resolutions yeah.

B

So I think there are two things one is is you know in the audio feel this was. This is used extensively everywhere because of the reduced reduction in blocking artifacts, and so we already kind of had an idea that this might be something that's interesting. We thought that the coding gain might be a bigger deal, um but it turns out some of the larger blocks. The improvement by using the web transform was was reduced and, in particular, moving to these 4 by 4, or these 4.8 point.

B

Only filters is kind of reduced it a little bit more, and there was some thought that this would allow us to. You know skip entirely some of the IPR around the deblocking filter. Although I you know, I noticed that cisco has some IP or other contributing. It may be possible to use that so and.

C

What was the point at which it it was giving a negligible gains. What was the like resolutions or block sizes where it started to taper off the gains? I mean.

B

It big the table I showed before yeah, so um you know at 16 by 16 it. You get almost 0 point point 3, 5 DB, so like larger than that it was kind of falling off even more than that. So.

I

My name Mozilla actually I in some sense the the coning gains reported here or kind of misleading in the sense that, especially for large blocks it in practice, it is actually a place where lapping benefits the most because of its reduced blog because of the reduced blocking artifacts. For example, the 8-point lapping is the wider than typical adaptive loop filters, so it creates less blocking our large on larger blocks, despite the fact that the coding gains that are measured that are like special theoretical measurements, they do not see that so and even on smaller blocks.

I

Despite the larger support sums, in some cases, you actually get better edge behavior. So it's not it's not as simple as looking at the coding and saying. Oh, it's an improvement here, not there that can sometimes being sleeting.

H

Tim Theriot can we go back to the coding a slide, so what one other interesting thing I wanted to point out about this is that if you look at the the lap transform dct eight by eight number, which is 14 point 12, that is already larger than the 16 by 16 DCT number for the same spatial support, which iOS also thought was interesting.

C

Mosin had again for Mike, so what what Jean machi just said struck me as it may have better sigh performance. Then then, then, some objective metrics currently show does that highlight the need for trying to find metrics to capture the you know the the artifacts like blockiness, better, the testing draft I.

B

Mean certainly I think I think that would be a useful metric to have specifically, but um where.

C

Then you work on.

I

That jonathan am not aware among the metrics we currently use for dello on. Are we compressed yet one of them is called fastest SI m and it is very sensitive to blocking artifacts. It has its own issues. It is by no means perfect, but I believe having many metrics is a good thing, and at least they can hope it when one behaves very differently from the others. It's at least a sign that one should bake very close attention to.

I

What's going on, I, don't think we should really spend so much time researching new metrics, because we might as well charter a new working group for that. This is as big as codex, but we certainly should use existing ones and see which ones are indicated of interesting behavior.

H

Yeah so Tim Terry, Brazil again just responding to mo arm so Google has a metric that they designed to detect blockiness and I'm sure there's plenty of other ones around there, but that particular one is open source and we've. You know, looked at integrating that into our. Are we compressed yet tool? It just hasn't happened yet, but we plan to do so.

B

Okay, thank you. Thanks.

A

Megan and next up, we have jean-marc.

I

So here, I'm presenting some very very early results for improving performance in screencasting operations are in screencasting applications.

I

The draft that I submitted is implemented in the context of the dollar codec, but it would actually be applicable to pretty much any other codec, because it's a completely separate coding technique next slide. So there's several properties that screencasting content has compared to normal photographic type video.

I

This is a possibly non-exhaustive list, so when the first property is the only one that I really addressed in this draft and it is being able to properly encode anti-alias text without making too much of less out of it and I'm talking about encoding in the pixel domain, I'm, not addressing any sort of vector side channel, so there's others, there's other special properties. For example, the content tends to have many horizontal and vertical edges like window borders and things like that.

I

It tends to have a reduced number of colors and at least in many blocks it does. There was also in terms of motion. It tends to be rectangular because people move windows around, so that is also very common. I did not address any of these, except for the first one, if you can think of others that we should consider, then I'd like to know yeah.

D

Jonathan Lennox, the other I, think the temporal qualities of screencasting are also very interesting because you tend to have nothing happening for a very long time, but all of a sudden everything changes at once. Basically right know if you hit, if if he hit the next slide boom ever you know other than the white background, all the pixels just changed, but before that it was completely static for 15 seconds or a minute or something so I think that's the other interesting difference with screencasting. Yes,.

I

I, I guess the simple window displacement should be like even like stronger in the sense, that's a very often there's nothing at all. Yeah, hey great! Yes,.

C

Mosinee I think again from a for Mike I. Think up when people say screencasting, it's a usually a very varied perspective of what the word means. Maybe we should highlight this more clearly in the requirement stress, but this could mean anything from what we think of presentation casting to you know wireless display and, of course, in wireless display. All bets are off of what the content actually is it most likely is not just you know, static people don't present to themselves.

C

You know when they're, when they're, using their machines, so I think we need to be careful about what properties we're targeting for for these screencasting modes. Yeah.

I

Well, one thing implicit that I should have mentioned also is that all of this should probably be on a like switchable inside a particular frame like obviously, if you do remote desktop, you might have a with a movie playing somewhere, and you want to be able to code both in the same frame. So I think this probably addresses.

D

A

Could I have you sent to the list? Your comment about temporal behavior Thanks.

I

Next slide, please, ok! So the approach that I'm presenting here is based on the haar wavelet. It is the very simplest wavelet one could possibly use and is. It is absolutely terrible for use in any sort of natural images. However, it has interesting properties for the very specific case of screencasting.

I

In part, because it has very compact support, so it has reduced ringing compared to pretty much any other transform.

I

So, very quickly, the on the left, you can see the actual mathematical definition of the heart transform for a one-dimensional, true, two-point, transform on the right. This is how the decomposition works in the example where you have a 2d transform of four by four points, so you have only a single DC over that block and you have very localized very localized basis functions, especially for the high frequencies, and this is the idea that for text you want to reduce ringing. So you can do it that way.

I

I

So once we have the actual transform, we need to encode the quantized coefficients and in terms of quantizing. This is done in a with what I call the l1 tree wavelet encoding. This is based on the tree structure of the wavelet transforms kind of similar to other taking tree based techniques like the easy W, the embedded 03 wavelet.

I

In this case, the main difference is that the tree is based on the sum of the absolute value of the entire three. So the very first thing we encode is the sum of the absolute value of the coefficients for the entire block, and then we say this sum is just distributed between horizontal diagonal and vertical in this way, knowing what the sum is and then for each direction, we start with the top-level tree, and we recurse down saying this is how the sum is split between the parent coefficient and all of the four children coefficients.

I

So this is a good way to also represent cases where an entire location has all zero wavelet coefficients. It actually performs slightly better than other techniques that were not based on the sum, and there is also some IPR issues that are made easier by working with sum of absolute values.

I

Next slide, so I hope this kind of shows up very relatively clearly. This is really a magnified image of the of it screenshot that I took the fact that it's my magnified, may change some artifacts, but it should give an idea so I'm going to show for images at exactly the same right. This is a crop of the much larger image that was encoded at the same size for all the different codecs. So this is what we get right now with JPEG next.

I

This is what we get right right now or a few weeks ago, with the dolla lap transform base encoder. So it's not really great in terms of ringing. Lapping is really good for many things, but for text it is terrible next slide. So this is what we can get with the with the simple har scheme that I just presented.

I

So the you can see the the ringing is much reduced on the on the text, especially.

C

Emotional for mike is this call constrained to the exact same number of bits. Is that what you're doing yes.

I

The total sum of the bits is exactly the same: the the entire image is a thousand something by a thousand something, and the total size in all cases was 82 kilobytes and by.

C

Jpeg you mean original jpeg with no extensions like AC or jpeg 2000, or anything like that. I.

I

Mean original jpeg encoded from on Linux, whatever it uses, probably not.

C

I

Don't believe this is Maj jpg.

I

So this is what we got with. Harwich has much less ringing around the text, and you can compare with. This is x265 which, on text, only images actually performed slightly worse than her. However, in this case, it's better because it handles all of them all of the long lines, much better than the hard transform, which is very localized like it, especially all the icons. Look a lot better in 265, but at least the text with the hard transform, looks pretty good next slide.

I

So in terms of objective evaluation, there is currently a screenshots that set in our we compressed yet composed of about a dozen screenshot, or so it is very preliminary. It can probably be improved a lot if some of you have a better a better test set that we can use it. That would be very appreciated. What we have is just random things.

I

We got from Wikipedia that the screenshot enough and weren't compressed before in terms of metrics on our we compress way that we have PSN rhps PSN, are ssim and fastest, as I am, and at this point it is not clear whether any of these are it. Whether any of these is any good on screen shots so far, it appears that PSN rhps is the least wrong, but I would not trust these metrics very much at this time, so I prefer looking at it so far,.

I

F

Shouting 2 comments, one is regarding via the comparison yours.

A

Please do very close to microphone Thanks.

F

So to comments, one is regarding the visual comparison you show. I would assume that people might have preferred x264 versus the heart. As you explained, even though you have text, you know sort of screen sharing with some text, but apparently x265 sorry handles the mixture of content at very well. Even when you're just you know, sharing a document yeah.

I

So right now the the the the the comparisons that I'm showing here is based on applying the hard transform everywhere in the image, which is actually not that good. In many cases, for example, there there are places where there's gradients and things like that which are really not well handled by har, and so that overall x265 actually does better than when I'm presenting here.

I

How the only of the test set, we haven't already compressed yeah the only image where the hard transform actually performed better than 265 was the old is the only one that has only a lot of text and nothing else, and on that one we actually perform better on the image here. What happens is that the the hard code needs to spend a lot more bits on the the icons and things like that, so it has to spend fewer bits into 65 on the text and it gets it's slightly worse and again.

I

This is like a first first pass. It's expected to get better.

F

Yeah, so your your plan would be to find a way to integrate two modes of compression within the same frame. Yes,.

I

Absolutely like we, this is an absolute requirement, because you cannot assume that the entire frame will be made of text some will. Some parts will be icons and lines and some part will be just like natural content and we need to be able to handle all of this, which is not at all implemented at this point,.

F

Set in common is regarding the metrics that you've been trying so far, based on my understanding. They were really designed to correctly I mean Pearson. R is a purely you know, statistical one. So, aside from that, all the others, the SSI ms, are really designed to reflect statistics in net from natural images. So that's probably why, from the very start, they are not good candidates to consider. It would be nice to consider things which match the visual characteristics of screen, sharing things like that reduce color, space, straight lines, etc. Yeah.

I

I absolutely agree, and if you have a metrics to propose that are actually good at screenshots I would I would be very much interested.

C

Imelda national for Mike, so I was surprised that you were targeting anti-alias text with hard, because I actually would've thought that that you know just straight lines or you know, non anti-aliased text would actually be where our outperforms and things you know things like 264 encoding, the more anti-aliasing you do. Typically, the better man coding ends up looking have you tried this on something like an Excel spreadsheet, with lots of sharp in horizontal vertical lines and maybe not non anti-alias text yeah. So actually.

I

Things like spreadsheet is one of the worst issues we have right now with the heart transform and if you go back to the the slide, where I'm, showing the basis functions, I believe this is the third slide or something just yeah here. So what happens? Is our heart or high frequency basis? Functions are very narrow, so if you have a line that spends the entire block, we have an entire line of nonzero coefficients, whereas in the DCT you can represent this more compactly. So this this is one area of improvement.

I

Right now that I'm looking at like how how to extend the trip either extend the transform use a different decomposition to be able to have more compact representation for purely horizontal lines. This would very much increase like the essentially the place where x265 does the best compared to when I'm presenting here is actually spreadsheets, because the there's lots of horizontal and vertical lines for which the dct is not so bad at representing and hard absolutely terrible. So this is a known place that can actually be improved in what I'm in what I'm, presenting and.

C

I assume by extra 65 it's only implementing main profile, not any of the new candidates for screen content profiles right no.

I

No, it's the main, it's the main one. So.

C

I

Smelted higher now totally, this is very early work who it needs to be improved very much.

A

All right, thank you very much. John Mark and now Tim's could talk to us about the hackathon.

H

All right, thank you. um So this weekend um we had a hackathon well for a large number of working groups, but one of them was net VC um and there we ran a bunch of experiments on both Thor and dalla and I thought that would be of interest to this group. So first thing we tried to do was integrate Thor into our we compressed yet, which is the website that we use to to test dolla, um so we had to disable be frame, support which seems to be okay, since it's not very well tuned anyway.

H

But the reason for that is that the current implementation requires the frame count of the video to be a multiple of the GOP size, which is currently 12 frames, and not all of our videos actually met that requirement, so that would have screwed up our numbers, um but we'll we'll get that resolved eventually. So next slide, um so here's comparing the two codecs via PS and are so the the muddy yellow or whatever they're at the top is Thor and the blue line, underneath it is, is dolla.

H

So it's currently showing that that Thor has a 43.5 percent rate advantage over dolla on PSN are, um which is not too surprising. Since we've we have not optimized for PSN are in fact intentionally done. Lots of things to to make PSN are bad, so two of those things are, we use quantization, matrices and activity masking and those are relatively easy to shut off. So we did next slide and that may reduce the gap to about twenty three point. Six percent, which you know, is a sizable amount, but by no means all of it.

H

So according to PSN are you know. Thor is doing much better, so next slide on PSN rh vs. The results are a bit more mixed, so the the bdr difference is less than one percent, um but you can see that that Thor is is doing better at the low rates and we are doing better at high rates, um in particular we're doing better at high rates, they're, probably so high that they're not actually practical, um but.

H

um But again is more mixed, um then next slide. If we look at fast, ms fastest as I am instead, then the story is the complete opposite that it says: Dollaz, ninety-one percent, better than Thor, um basically across all rates and so I don't know what any of this means. um This will probably involve actually spending some time, looking at images and videos, as opposed to staring at curves, to figure out who's, doing better in what scenarios and what conditions etc.

H

um But the good news is is that the tube contributions do appear to perform very differently, so you know we may be able to take the best of both and wind up in a much better place. So next slide.

H

um So then we wanted to start to understand a little bit more detail like what is what is responsible for the differences in performances, um and so one experiment we tried was to basically take dalla and rip out all of dollars, motion compensation and replace it with Thor arm, though, motion compensation in dollars relatively decoupled, so it was actually a fairly easy experiment to run. We ran four different variations of this, which I'll describe in the next few slides.

H

um So if you call the block diagram from abdallah from monday's session, we basically did it took the OB MCC block right there next slide and replace that with, or so everything else is still running and he's still using dalla. So Thor forms the prediction frame and we run lap transforms on that prediction. Lap transforms on the inputs and both of those two pv q and use all our quantization and entropy coding, and all that um so again, this is because this is running a dolla doesn't do any multiple references.

H

It doesn't do any frame reordering on any of those things. So next slide.

H

So for the first experiment in Thor we disabled residual coding because we're trying to use this to make a prediction for dolla, we disabled all of the intra modes, because dalla does not have inter modes in its motion compensation and we disabled, 64 x, 64 blocks because dollas motion compensation only goes up to 32 by 32 arm, and you can see from that. That dolla is about 24 to 28 percent, better um with a basically unmodified Thor, except for just shutting these things off. But this is not a really a fair comparison.

H

um So, in this case, the the yellow line there is the Thor one which is underneath the blue. One sodala is better, um but this is, as I said, this is not really a fair comparison arm, because Thor's is spending a bunch of bits. You know in order to encode the possibility that it might use some of all these things. You know some of these modes and things that we've all disabled, so the next thing I did was go around and basically you know disable coding bits for those things.

H

So we stopped coding the coded block patterns. We stopped coding the split for 464 by 64 on and reworked the the mode tables just to exclude all the modes that Thor doesn't actually use arm and that's that made between a 13 and a fourteen percent difference between the Thor's.

H

So is now actually much closer within ten percent of where dollars, um and that was it was the second experiment we ran, um and so typically you would expect Oh BMC to be doing with about one decibel better, which on the first slide, was roughly where we were, but when turns out, when you disable all these bits, it's actually much closer than you might expect. So maybe room for improvement for us there. ah So next slide. So the third experiment we ran was to re-enable the intra modes and add them back to the VLC.

H

Solutely actually had the cost for them, and I should point out by the way that, for all these experiments, I didn't actually write a decoder. This is all changing the encoder. Only so I could very easily have screwed something up in all this stuff, but um but we re enabled these intra modes arm and the numbers here at the top are comparing the previous Thor experiment to the current Thor experiments.

H

This is not Thor versus all anymore, but it says just turning on those inter modes so that we have some kind of intra prediction in our motion compensation for places where the motion estimation is not going to do a good job, you know makes less than a two percent difference, which was somewhat reassuring. Since you know the fact that we don't have this in Dawa has has, you know been something that we thought might be a severe performance limitation for a long time arm.

H

This is the actual performance limitation in dollars, probably a bit higher than this, um but you know for various different reasons, but you know this at least gives you some kind of ball mark, and it was a 15-minute experiment. So no it's nice to have a nice. You know small and simple code based like Thor to run. Some of these on is doing. This experiment in dalla would have been much more difficult, so next slide.

H

So the final experiment was basically the the cheating experiment, um so this this turns on all the things in Thor that that would make sense to turn on so specifically 64, x, 64 blocks um and also coding. The splits for those again- and the surprising thing to me at least, was that this still makes you know between a seven and twelve percent difference, um depending on which metric you look at, and all that mostly shows up at low rates on these.

H

The slightly curious thing is that at very high rates you can actually see that that made things worse for Thor. So again, these are numbers comparing the previous Thor experiment to the to this door experiment arm. But if you go to the next slide that actually puts Thor ahead of dalla when you turn all of these things on um so that suggests, there's some room to improve dalla, thereby adding 64 x 64 blocks, and hopefully we can get similar improvements out of it all right next slide.

H

So one of the tools that that Harold mentioned was Thor's constrain low-pass filter, since this is just a loop filter at the very end of the processing chain. It was pretty easy to stick this into the very end of dollars, processing chain arm.

H

So what we did at the hackathon was just a very simple hack where we have no signaling, so we decide on a super block by superblock basis, whether to enable the filter, but we didn't actually code anything just to say whether or not we were doing that um we have better patches now that are showing real gains, but I don't have them on these slides arm.

H

But the nice thing about this is that this was actually solvent, solves a long-standing quilting artifact um that we had. We had observed at in fades at low rates, which is actually pointed out to us by thomas davies, um so next slide good and it actually shows up on the screen. um So if you, if you have a fade to or from black, at very low rates, we would get what you can see is these quilting artifacts, so they actually show up.

H

As you know, bright edges on the the boundaries between blocks um and I can see that up here. I hope, not sure. If people in the back can see it all right. Good projector.

C

Is good at artifacts, it's very good bad! It's good at bad, not good! Yes,.

H

So so that looks very, not wonderful, and then we had done some analysis and we understood why this was happening, which is complicated and I can go into it in detail if you want, but you probably don't want me to, um but the best solution that we had had come up with previous to this was to arm switch to 12 bit reference frames, so so I believe, as we is Nathan mentioned, we scale up all of our are pixels by by 16 before running them through our transforms, and then we scale them back down to 8 bits before storing them in our reference frames.

H

Well, if you don't do this, that scaling back down this artifact goes away, but that also doubles the amount of memory that you need for your reference frames and increases your memory bandwidth and has lots of other bad effects. So we thought maybe this constrained low-pass filter would would be able to solve the problem without that expense. And indeed, if you go to the next slide, it makes this basically go away arm. So that was a nice positive development and those were the results from our hackathon. So are there any questions.

D

All right, great thanks great.

C

That you guys were able to hack away before the meeting. Oh thank.

H

You for getting us the code, so we could.

E

People want commit Cullen, James people won't commit access or something code, like you know, send me email.

A

Alright, that's all we had in our agenda, so you get 15 minutes to do whatever you want.

A

There is one more set of blue sheets out there somewhere. If you have them, please hold them up in the air and I'll come grab them at.

C

Anyone that hasn't had a chance to sign feel free to come up and grab sign.

A

Thank you I'm very amused. They make that look like the old thing you had to go.

A

D

Awesome. Thank you very much behaviorally, so grab.

A

It before it dies so.

D

A

You being, will you be shanghaied.

D

Anybody with a blue diocese I know right.

A

A

Ya, blue parently is not an option or that much I'm, not sure why you wonder if you on your bag back back again badly, but.

D

C

Of the new Kris.

D

Get stars this you, the Australians. They.

C

D

A

Can ascribe we can give it out to anyone you on as tears. We use as a special attack. I should have stuck one on Ryan and I was okung with quick, Martin Kiki. If two-thirds, sorry for sleeping.

D

A

You know taking.

D

Notes and being awake for computer eating a four-point star.