►
From YouTube: IETF100-NETVC-20171115-0930
Description
NETVC meeting session at IETF100
2017/11/15 0930
https://datatracker.ietf.org/meeting/100/proceedings/
A
B
A
C
So,
first
off
note:
well
should
everybody
should
be
familiar
with
this,
but
for
this
particular
working
group,
please
make
absolutely
sure
you're
familiar
with
it,
because
the
goal
this
working
group
is
to
produce
royalty
free
video
codec.
So
we
definitely
need
everyone
to
pay
very
close
attention
to
the
IPR
rules.
In
the
note.
A
C
C
Anyone
on
the
media
Co
want
to
volunteer
for
taking
notes.
C
C
All
right
so
quick
agenda,
Bash,
just
gonna,
spend
a
few
minutes,
probably
not
ten,
on
on
the
status
of
the
current
documents
and
then
Thomas
is
going
to
give
us
an
update
on
the
test
document
and
Steiner
will
give
us
an
update
on
Thor
and
81
progress
and
comparisons
followed
by
Tim,
giving
an
update
on
dalla
81
transforms
and
then
Luke
is
going
to
give
us
the
latest
chromo
prediction
from
luma
as
it's
being
used
in
81.
Any
changes
people
want
to
make
to
the
agenda
any
other
items
to
bring
up.
C
We
have
a
milestone
for
the
requirements
document
that
that
was
a
workgroup
last
called,
but
because
it's
being
used
by
some
other
standards
body
or
some
other
industry
consortium,
there
may
be
some
substantive
changes
to
it
that
we
foresee
coming
up
pretty
soon
so
we're
holding
off
on
the
Shepherd's
right
up
for
now
and
we're
waiting
to
see
if
there's
gonna
be
some
substantive
changes
before
we
progress
it
on
to
the
is
G,
hopefully,
that'll
conclude
within
with
in
November,
so
we're
gonna
update
the
milestone
2
to
November
I
hope
to
get
that
done
before
the
end
of
this
month
and
the
testing
document
we're
gonna,
update
that
milestone
as
well,
because
it's
a
not
going
to
be
concluded
anytime
soon.
C
We
decided
to
keep
it
alive
as
a
living
document,
while,
while
the
codec
candidates
are
progressing
because
we
expect
the
test
methodology
is
going
to
keep
evolving.
So
we
don't
want
to
freeze
the
document
pretty
much
early.
So
once
we
have
a
candidate
we're
comfortable
with,
and
a
testing
methodology
we're
comfortable
with
as
being
pretty
stable,
then
we'll
we'll
move
it
on
the
chain
for
the
actual
codec
candidates.
C
We
want
a
single
merged
codec
and
we
lack
that
right
now,
so
we
do
not
have
a
candidate
yet
for
the
milestone
for
either
the
codec
spec
or
the
reference
implementation.
So
we
lack
a
merged
candidate
or
merged
codebase.
So
you
see
Tim
and
stun
are
presenting
different
tidbits,
but
not
one
consolidated,
codebase
or
standard.
So
that's
that's
the
glaring
issue.
C
We
need
to
deal
with
his
workgroup
and
try
to
bring
out
the
closure
and
Austin's
going
to
get
pushed
out
to
July,
so
hopefully
by
then
we'll
have
more
clarity
on
on
how
to
come
up
with
a
converged
candidate
and
then
finally,
there's
the
milestone
for
carrying
this
codec
inside
of
containers
storage
formats.
So
we'll
push
that
out
to
end
of
2018,
because
that
work
hasn't
even
started
yet.
C
C
D
D
There's
a
small
command-line
changes
and
intestine
arrows
so
next
slide.
So
the
first
one
is
CLI
parameters.
We've
made
some
small
changes
to
the
codecs
that
we
test.
One
is
that
we
removed
a
constraint
for
our
leg
and
frames
from
olymic
died.
We
previously
had
divorced
this
225,
but
it
turned
out
that
was
less
than
ideal
value.
It
basically
buffered
way
more
frames
and
we
technically
needed
to
on
the
encoder.
So
we
decided
to
drop
that
out
of
the
command
line.
As
you
can
see,
that
command
line
is
pasted
on
the
bottom
there.
D
We
just
leave
it
up
to
the
encoder
to
pick
the
maximum
reasonable
value,
which
I
think
is
nineteen
in
the
current
encoder.
But
the
general
idea
is
with
the
command
lines
we
want.
As
many
me,
you
know
as
few
possible
constraint.
We
will
not
impose
just
a
minimum
constraints
possible
at
encoder,
big,
the
rest,
so
that's
in
the
right
direction.
So
is.
D
D
So
that's
all
second
slide
slide
number
for
a
dollar
problem
is
that
person
we
basically
only
allowed
two
very
slowest
encoder
notes
to
be
used
for
a
dollar.
This
was
the
default
or
and
for
I.
Think
two,
where
we
had
a
text
file
configuration
and
for
a
one
that
killed
a
CPU
used,
equals
zero,
and
this
became
increasingly
problematic,
as
you
know,
and
I
think
in
particular
got
much
much
slower
over
time
was
their
current
objective.
One
vast
suite
of
videos.
D
D
This
is
not
certainly
ideal
I'm
working
on
a
better
solution
for
this
I
think
a
better
solution
will
actually
involve
custom.
You
know
speed
parameters
to
the
encoder
that
are
better
tuned
to
match
a
real
usage.
The
one
bad
thing
is
that
we,
by
searching
less
partition,
types
and
sizes,
we
kind
of
bias
towards
smaller
partitions
and
not
the
rectangular
partitions,
which
is
could
could
affect
some
tools
in
the
negative
way.
So
we're
trying
to
find
something
better.
But
this
is
a
stopgap
for
now.
D
C
So
this
is
Moe
virtually
from
the
floor
mic
and
be
lazy,
since
there's
no
queue
just
stay
up
here.
The
the
testing
methodology
itself,
though
not
the
infrastructure,
with
the
testing
methodology
itself,
still
specifies
that
the
testing
should
be
on
the
maximum
compression,
not
not
any
other
slower
mode.
Yeah.
D
So
the
there's
basically
a
section
at
the
end
at
a
testing
draft
that
basically
says
you
know
different
things.
You
can
test
write
it,
so
you
could
do
like
subjective
tests
and
you
can
do
across
codec
comparisons
and
there's
like
objective
tests
for
tools
and
that
one
is
the
one
that
we
allow
into
not
used
a
very
slow
mode,
basically
by
turning
off
the
exportation
search.
Oh.
E
C
D
C
Okay
I
mean
all
codecs
will
have
knobs
to
adjust.
The
compression
speed,
trade-off,
I
think
the
it's
important
for
the
testing
document
to
state
that
the
objective
comparisons
will
be
done
at
max
compression.
Otherwise
it's
a
pretty
hard.
It's
another
dimension,
another
layer
of
curve,
another
dimension
of
curves
that
we
have
to
look
at
to
evaluate
performance.
I
know:
we've
presented
some
of
those
and
they're
useful
Steiners.
D
At
the
same
time,
so
the
the
one
for
thing
is
I,
don't
want
to
evaluate
if
I
specify
this
I
don't
want
to
evaluate
the
candidates
in
terms
of
you
know,
meeting
the
requirements
criteria
on
this.
This
is
purely
for
like
individual
tools
and
drafts,
particularly
I,
do
want
to
know,
like
people
I
think
people
later
will
probably
present
with
CPU
used
settings
with
arts.
Ro
and
I
would
like
a
way
to
normalize
this,
so
that
people
people
use
the
same
CPU
use
settings
and
stuff
if
they
need
it
for
speed
reasons,
but
yeah
this.
D
F
Jonathan
Linux
I
mean
it
seems
like
with
any
codec.
You
know
you
know
you
can
go.
Your
code
can
go
to
absurdly
slow
right,
I
mean
this
is
doing.
I've
heard
the
phrase
frames
per
day.
As
the
speed
I
mean
anything
could
do
exhaustive,
search
every
possible
bit
stream
to
see
which
one
best
matches
you
know
in
the
extreme.
F
G
C
D
And
the
other
strong
opinions,
one
away
to
other
I,
mean
based
on
a
current
feedback.
That
sounds
like
I'd
like
to
specify
this,
but
made
only
for
the
a
women
testing
and
maybe
not
too
explicit
the
exact
parameters,
and
so,
if
anyone
but
well
I'll
change
it
basically,
so
it
allows
us
to
do
this,
but
it
doesn't
specify
two
parameters
and
seems
like
the
middle
ground.
I
hear
three
ownage
x2
that
let
me
know
otherwise
I'll
do
that.
D
H
E
H
Have
some
updated
charts
on
the
performance
and
compression
trade-offs?
So
this
time
the
for
github
has
actually
been
updated,
it
hasn't
been
updated
for
a
while.
So
I
guess
that's
about
time.
The
main
change
is
that
I've
added
the
support
for
the
CDF
filter
through
which
was
which
has
been
adopted
in
everyone.
H
H
H
H
What
we
had
before
it
was
that
we
didn't
signal
Anna
strength
for
a
future
block.
If
all
the
coding
blocks
within
that
filter
block
were
skipped,
but
the
trouble
with
that
is
that
you
have
to
de
call
the
entire
filter
block
in
order
to
know
whether
all
the
blocks
are
escaped
and
the
Harvard
people
don't
like
that.
They
want
to
start
a
filtering
as
soon
as
I
can
turn
our
coding
book
as
soon
as
possible,
not
wait
and
it
until
the
end,
because
they
don't
like
the
Bur
for
anything.
H
So
it
was
agreed
to
change
that.
So
now
the
there's
no
signaling
for
the
filter
block.
If
the
coding
block
size
is
64
by
64,
meaning
that
there's
no
petitioning
of
that
block
and
the
calling
block
is
skipped
that
makes
it
possible
to
signal
the
filter
strength
just
after
the
Skip
flag
for
that
block.
H
Cdf
still
needs
the
signal
at
64,
+
64
resolution
during
the
development
of
sealed
F,
which
was
merged
with
the
dollar-driven
to
form
seeded
I
tried,
different
block
sizes
and
64
by
64
was
by
far
the
best
size.
So
we
don't
want
to
change
that
so
for
a
large
super
block
128
by
once,
22
single
up
to
four
presets
in
order
to
keep
the
the
same
field,
the
block
size,
the
details
haven't
been
what
I
decided.
Yet
how
to
do
this?
H
H
H
We
see
quite
large
gains
for
chroma
of
the
4%
I'm,
not
quite
sure
why
we
didn't
see
again
that
high
in
in
everyone,
so
I
think
I'll
investigate
that
see.
That
does
add
more
complexity
or
we
will
see
a
path,
though
it's
it's
more
processing,
but
that's
not
unexpected.
I
also
tried
running
CF
on
top
of
C
death.
H
Currently,
it's
a
good
way
to
speed
up
for
what
I
I
think
that
it's
much
better
I'd
rather
to
do
work
on
the
sea.
Deaf
audio
I
think
it
might
be
hard
to
make
it
as
fast
as
to
see
a
deaf.
Are
they
all,
but
it
should
be
possible
to
come
close
without
you
to
big
losses.
Next,
like
this
another
change,
since
the
last
meeting
is
in
a
VA
one,
we
have
three
filters
applied
in
cascades,
D
blocking
see
death
and
then
loop
restoration,
and
that.
H
Adds
a
lot
of
buffer
requirements
and
again
hardware
people,
don't
like
that.
So
there
was
a
new
proposal
from
arm
to
an
Android
contribution
until
Google
and
Mozilla
to
reduce
the
buffer
requirements
without
stat
there's
an
either
a
minimum
of
30
lines
of
banned
buffers.
But
with
this
new
proposal
it's
possible
to
reduce
that
choose
16
lines
next
slide,
please.
H
So
the
basic
idea
is
to
some
normative
changes
and
non
normative
changes
and
non
normatively
possible
to
do
some
shifting
of
the
CDL
or
filtering.
But
the
main
normative
change
is
that
when
loop
restoration
looks
outside
the
super
block
that
was
produced
by
C
death,
it
will
roll.
It
will
really
deep
blocked
output
instead,
the
CDF
outfit
and
that
breaks
the
dependency
between
CDF
and
Luke
restoration.
H
So,
a
bit
more
on
the
encoder
complexity
on
see
death,
as
I
mentioned
I
was
working
on
simplifying
the
audio
and
I
think
that
can
be
improved
even
more
just
as
a
test.
How
far
I
could
get
I
try
to
restrict
the
50
to
do
no
block
level
singling
and
when
I
do
that
I
still
get
gains
objective
gain
similar
to
see
lbf
but
I
think
in
that
case
the
subjective
gains
are
still
much
better
than
sealed
F,
because
we
will
get
the
the
directional
part
of
C
death.
H
So
in
that
case
the
encoder
will
just
have
to
select
the
optimal
strength
for
the
entire
frame,
but
that's
a
quite
small
search
space
so
and
some
other
simplifications
that
I
have
tried,
which
work
well
is
to
select
the
damping
used
in
the
filter
core
based
on
the
frame.
Qp
and
I've
also
tried
to
decide
the
number
of
bits
to
use
per
block
based
on
the
frame,
QP
and
sorry,
based
on
the
bitrate
and
friend
pipe
and
I
think
that
it
still
many
ways
to
improve
the
CDF
audio.
H
H
So
these
are
the
results
that
I
got
for
adding
see
death
in
for
heron,
comparing
just
doing
D
blocking
and
with
the
deep
locking,
plus
C
death
and
in
the
low
complexity
case
is
now
a
6.2
percent
and
the
chrome
app
is
not
even
better.
If
I
look
at
the
see
ie
de
number,
it's
actually
ten
point
three
percent,
which
is
I,
think
it's
quite
impressive,
even
in
the
harder
the
compression
is,
it's
a
six
point
three
and
in
the
high
efficiency.
H
It's
still
five
point,
two
percent
and
three
point:
one
percent
in
the
low
that
I
and
highly
lay
configurations.
So
that's
that's,
not
bad
I!
Think
next
slide,
please,
and
if
we
compare
this
with
Z
of
the
F,
so
the
these
are
the
gains
that
we
get
from
replacing
C
of
F
with
C
death.
This
ee,
I
ee
number,
is
2.2
percent.
H
In
the
low
complexity
low
today,
configuration
manager,
ops,
21.1%
in
the
high
efficiency,
a
high
delay
configuration,
so
it's
not
a
huge
difference,
but
the
main
reason
to
add
C,
that
is,
to
improve
the
actual
visual
quality
and
and
in
every
one
we
did
some
subjective
tests
comparing
see
a
path
with
Edith,
and
even
though
the
change
was
less
than
1%
in
in
every
one,
people
could
still
tell
the
difference
so
that
probably
points
towards
a
real
difference
of
at
least
five
percent
next
slide.
This
mo.
H
C
H
C
E
C
For
for
mark
again
so
before
you
finish
on
CDF
I
want
to
raise
one
issue
related
to
the
requirements
document
that
we
expect
some
substantive
changes
in
I
think
one
of
them
may
be
related
to
support
of
four
to
two
chroma
format.
Video
and
I.
Believe
CDF
is
one
of
the
barriers
to
that,
because
the
direction
search
does
not
support
rectangular
blocks.
Is
there
any
plan
to
address
that
in
any
way,
so.
E
Terry
vary
from
forearm.
The
the
directional
search
has
only
ever
done
on
luma.
So
what
normally
happens
is
the
chroma
uses
the
direction
that
luma
found
when
it
to
orient
its
filters,
and
since
there
isn't
a
direct
correspondence
between
the
directions,
we
have
a
luma
and
the
directions
we
have
in
chroma.
When
you
squeeze
the
chroma
blocks
into
a
rectangle,
then
we
disable
the
filter.
Sorry.
E
H
C
H
H
A
H
Starting
in
July
last
year,
at
zero
of
the
compression,
the
be
dynamic
goes
down,
which
is
good.
So
at
the
last
meeting
we
had
the
VBR
gain
of
about
20
percent,
and
that
is
now
about
25
cents,
and
the
graph
has
been
steadily
dropping
with
the
additions
of
new
tools
in
every
one,
and
there
are
still
some
tools
left
not
yet
enabled
so
I
expect
this
to
drop
slightly
more,
so
we'll
see
next
slide.
H
Please-
and
this
is
the
complexity
history
note
here,
that
the
y
axis
is
logarithmic
and
the
y
axis
is
the
frames
per
minute,
not
they
not
seconds
but
minutes
it's.
It
started
at
around
15
last
year
in
July
and
is
now
round
one
frame
a
minute,
so
there's
a
change
of
the
factor
of
15
and
it
seems
to
be
flattening
somewhat.
But
again,
this
is
a
logarithmic
scale
on
the
y-axis
I
think
this
shows
that
the
compression
gains
that
we
haven't
seen
in
anyone
don't
come
for
free.
It
has
a
big
cost.
H
So
if
we
compare
vp9
with
everyone,
I
think
currently,
everyone
is
basically
a
continuation
of
vp9.
If
you
plot
it
with
difference.
Complexity
settings,
so
you
have
a
big
toolbox
and,
as
you
add,
more
tools
to
the
codec,
you
get
compression
gains,
but
you
also
get
that
speed
penalty,
and
the
question
remains
whether
that
2
vols
box
is
a
better
tool
box,
not
just
a
larger.
G
H
It
will
probably
it
hasn't,
been
a
great
focus
to
speed
up
everyone,
so
that
will
probably
get
more
focused
as
the
actual
tools
are
finalized,
but
yeah
right.
The
reference
and
gallery
isn't
that
practical
yeah.
We
can't
simply
I
think
the
the
specification
says
that
we're
supposed
to
run
4k
sequences
and
bit,
but
we
can't
practically
do
that
now,
so
nobody
has
been
presenting
the
test
results
according
to
the
specs,
actually,
because
it's
simply
too
slow
and.
H
C
E
E
So,
although
I
got
this
stuff
started
a
few
years
ago,
those
two
have
really
been
doing.
The
bulk
of
the
work
lately
so
I
think
most
of
the
credit
of
the
recent
developments
goes
to
them.
Next
slide,
I'm
going
to
talk
a
little
bit
about
what
our
goals
were
in
designing
transforms
for
dello.
One
should
be
pretty
non-controversial
as
we
wanted
an
exact
integer
implementation.
E
It's
just
the
way
that
video
codecs
have
worked
ever
since
264.
There's
lots
of
iterative
prediction
with
unstable
filters,
so
you
want
an
exact
specified
implementation
so
that
all
all
decoders
agree
and
there's
no
drift.
We
also
wanted
to
be
able
to
support
many
different
variations
of
the
transforms
so
low
bit.
Depth
hide
the
depth
both
square
and
rectangular,
discrete
cosine
transforms
discrete
sine
transforms,
etc.
E
E
That
said,
we
want
to
keep
software
complexity
as
low
as
possible,
in
particular,
paying
attention
to
how
things
would
be
implemented
in
Cindy
and
at
the
same
time
we
want
to
have
reasonable
hardware
complexity,
which
means
we
need
low
latency
for
small
transform
sizes
and
for
all
these
variations.
We
want
to
keep
transform,
reuse
and
embedded
designs
in
mind,
so
that
stuff
will
come
I'll
come
along
as
you
go
through
some
of
the
slides
here
next
slide.
So
just
just
to
start
us
off.
E
This
is
the
the
four
point
discrete
cosine
transform
for
each
sixty-four.
It
is
very
low
complexity,
so
you
can
implement
this
with
8
ads
and
two
shifts.
It
has
a
few
drawbacks.
One
of
them
is
that
that
it
is
a
non-uniform
scale
transform.
So
the
coefficients
that
you
get
out,
even
though
the
discrete
cosine
is
this
unit
very
transform,
where
all
the
basis
functions
have
the
same
magnitude
of
1.0.
E
This
gives
you
out
coefficients
that
have
different
skills
that
you
then
have
to
multiply
by
and
that
usually
gets
absorbed
into
the
the
quantization
step.
So
you
know
say:
oh
we're.
Saving
one
multiply,
but
in
reality,
in
in
the
way
encoders
are
designed
today
we
do
rate
distortion,
optimization
with
several
different
possible
quantization
levels
for
all
the
different
coefficients.
E
So
you
actually
need
to
do
several
multiplies
in
there
in
order
to
get
a
consistent
estimate
of
distortion
that
backs
out
this
scaling
factor
and
that
those
extra
multiplies
get
multiplied
by
the
number
of
different
options
that
you
search
in
the
encoder,
which
is
we
just
saw.
You
know
this
can
be
quite
a
lot.
E
E
So
this
is
the
vp9
four-point,
discrete
cosine
transform
and
I
may
pick
on
vp9
a
little
bit
today.
Just
it's
not
because
I
think
the
vp9
design
is
bad,
but
it's
actually
a
fairly
standard
textbook
design
for
transforms,
but
I
think
we
can
do
a
little
bit
better
and
so
I
want
to
talk
about
some
of
the
improvements
we've
made
relative
to
vp9,
just
because
vp9
transforms
are
the
ones
that
I
know
the
best.
E
So
this
is
the
4-point
DCT.
It
actually
has
six
multiplies.
They
are
full
32-bit
products.
So
if
you
look
at
the
bottom
there
we
were
actually
taking
two
of
these
products
and
adding
them
together.
So
we
need
the
full
32-bit
result
in
order
to
do
that,
and
then
it
additionally
has
eight
adds
two
of
those
happen
at
32
bits
and
then
four
shifts
all
right
next
slide.
E
So
there
are
a
few
avenues
for
improvement.
One
is,
is
simplifying
the
multiplies.
So
if
you
looked
at
the
264
design
like
we
could
just
scale
the
outputs
of
those
that
transform,
then
it
would
only
cost
four
multiplies
instead
of
six,
but
the
264
design
is
not
a
real
DCT,
it's
only
an
approximation
to
a
DCT,
so
it
would
be
a
little
bit
less
accurate,
but
we're
going
to
see
in
a
bit
we
can
actually
do
just
as
well
with
an
accurate
transform.
E
So
the
other
approach
for
improving
things
is
has
to
do
with
scaling.
So
the
vp9
DCT
adds
this
factor
of
a
square
root
of
2
relative
to
a
unitary
transform
and
in
fact
it
turns
out,
as
you
make
the
transform,
larger
and
larger
each
time
you
double
the
size
of
the
transform.
It
adds
an
additional
factor
of
of
the
square
root
of
2.
So
this
is.
This
is
sort
of
okay.
E
If
you
take
the
log
of
the
width
on
the
low
that
the
height
and
that
comes
out
to
be
even
then,
you
can
just
correct
the
thing
with
a
shift,
but
now
we
want
to
use
rectangular
transforms
like
an
8
by
4,
transform
or
something
along
that,
and
now
this
scale
factor
becomes
odd,
and
so
we
can't
correct
it
with
a
shift.
We
actually
have
to
correct
it
by
doing
one
multiply
for
coefficient
in
order
to
get
something
that
matches
the
same
same
scale
as
all
of
our
quantizers
all
right
slide.
E
So
where
does
this
scaling
actually
come
from
structurally
next
slide?
This
is
sort
of
the
the
textbook
factorization
of
a
type
two
discrete
cosine
transform.
So
it
starts
out
with
this
stage
here
on
the
left,
we're
basically
computing
sums
and
differences
of
pairs
of
pixels,
sometimes
called
plus
1,
minus
1
butterflies
or
something
to
that
effect.
And
then,
after
that,
you
can
split
the
thing
into
a
smaller,
discrete
cosine
transform
and
a
smaller
discrete
sine
transform,
alright
slide.
E
So
that's
where
that
factor
comes
from
next
slide
and
because
this
is
recursive,
there's
another
one
inside
there
and
as
you
as
you,
expand
the
transform
by
a
factor
of
2
each
time
you
get
an
additional
one
of
these,
these
factors
of
a
square
root
of
2,
and
you
also
wind
up
having
to
do
something
in
the
discrete
sine
transform.
That
would
is
also
expansion
area
like
this.
If
you
want
the
scales
to
be
uniform,
all
right
next
slide.
E
So
we'd
like
to
get
rid
of
this
extra
scaling
so
that
we
don't
have
all
these
extra
multiplies
in
our
rectangular
transforms.
So
one
way
we
can
do
that
is
we
can
use
multiplies
and
in
fact,
if
you
go
back
and
look
at
vp9
s
for
point
DCT,
they
actually
already
do
this.
So
I,
don't
if
you
flip
back
to
the
slide
4.
E
So
this
step
up
here
is
actually
would
be
the
same
thing
as
a
plus
1
minus
1
butterfly.
But
then
it
has
scaled
the
outputs
out
after
that,
so
that
they
match
the
discrete
sine
transform
at
the
bottom
there.
So
that's
that's
one
way
to
correct
this
going,
but
that
only
got
rid
of
it
out
of
one
stage
and
we're
getting
a
set
of
these
on
at
every
stage,
so
that
winds
up
being
kind
of
expensive.
So
another
approach
is,
we
can
restrict
ourselves
to
only
using
shifts
and
ads
and
use
asymmetric
scaling.
E
So
we
have.
There
are
basically
two
different
options:
the
the
construct
at
the
top
there
computes
a
sum
and
difference
where
the
the
output
of
the
second
component
is
have
compared
to
what
you
would
normally
get
and
then
the
next
one
computes
a
sum
and
difference
where
the
output
of
the
first,
the
first
output,
is
half
compared
to
normally
get,
and
as
you
see
you
can
do
this
with
with
just
by
adding
one
shift
in
between
the
the
two,
the
two
additions
or
subtractions.
E
So
what
happens
is
instead
of
instead
of
doing
an
addition
and
subtraction
and
having
both
of
the
scales
increased
by
a
factor
of
square
root
of
two.
What
we're
actually
doing
is
increasing
one
by
a
square
root
of
two
and
decreasing
the
other
by
a
factor
of
square
root
of
two,
so
they
become
asymmetric.
But
overall
you
know
the
scaling
is
unity.
So,
like
the
determinant
of
this,
this
transform
as
a
whole
is
still
1,
and
then
we
can
cancel
out
this
asymmetry
in
subsequent
steps.
E
E
We'd
also
like,
as
I,
said,
to
simplify
the
multiplies,
so
all
of
these
multiplies
come
from
plane
rotations
between
two
variables,
so,
basically,
in
all
of
our
transform
factorizations
we've
decomposed
it
into
a
series
of
these
plane
rotations
where
we're
taking
two
values,
and
we
are
rotating
them
by
some
amount.
So
we
can.
Actually,
you
know
instead
of
doing
that,
as
as
a
matrix
multiply
here,
we
have
four
multiplies
in
two
additions.
We
can
get
rid
of
one
multiply
and
instead
add
an
addition
by
using
a
construct
like
that
at
the
bottom.
E
Here,
all
right
next
slide,
so
we
can
actually
also
arbitrarily
scale
the
inputs
and
outputs
of
these
rotations,
so
just
multiplying
through
you
can
instead
derive
a
series
of
steps
which
which
looks
like
this,
and
the
important
thing
to
note
is
that
that
all
of
the
all
of
the
complex
stuff
there
is
basically
just
reduces
down
to
a
constant,
and
so
it's
it's
x0
minus
a
constant
times.
X1
x1
minus
a
constant
times
p0
and
then
p0
minus
a
constant
times
y1.
E
E
E
So
if
we
actually
had
to
go
compute
a
full
32-bit
product,
we
could
only
do
that
in
with
half
the
throughput
in
a
fixed
size,
Sindhi
register,
so
ssse3
and
neon
actually
both
have
instructions
for
doing
exactly
this
kind
of
multiply.
So
it's
a
single
instruction
that
will
do
the
multiply.
Add
the
rounding
offset
and
shift
the
product
over
to
the
right,
and
so
none
of
that
has
to
has
to
expand
out
to
a
full
32
bits
so
that
whole
thing
fits
in
in
16
bits
next
slide.
E
E
So
as
you
as
you
get
larger
and
larger
transforms,
you'll
also
be
able
to
share
more
and
more
of
these
shifts
between
the
stages
like
this,
and
that's
just
because
of
the
way
that
we
arrange
them
all
right
next
slide.
So
expanding
that
out
we
can
do
an
8-point,
DCT
next
slide
or
16-point
DCT,
and
that
keeps
going
up
to
64
points.
I.
E
C
E
But
the
other
other
point
to
make
is
these:
things
do
have
embedded
structure,
so
both
the
endpoint
DCT
and
the
endpoint
discrete
Coast
are
discrete
sine
transform,
are
embedded
inside
a
discrete
cosine
transform
that
is
4
times
larger,
so
that
embedding
actually
skips
a
because
of
the
asymmetries.
So
if
you
only
go
up
one
level,
then
then
we're
actually
taking
asymmetric
inputs,
and
so
it's
not
exactly
the
transform.
You
need
basically.
E
So
a
few
notes
on
accuracy.
So
all
of
these
these
right
shifts
and
multiplies,
introduce
rounding
errors.
We
want
to
keep
those
as
small
as
possible,
so
we
can
the
the
way
we
go
about.
This
is
that
we
shift
up
the
input
by
some
number
of
bits
before
we
do
any
of
the
transform,
and
then
we
do
the
full
four
transform
quantize
code,
D
quantize,
inverse,
transform
and
then,
on
the
other
end,
when
we
finally
get
down
to
pixels,
we
ship
down
the
output
again.
E
So
how
much
do
you
shift
while
we
found
diminishing
returns
at
about
four
bits,
and
that
was
enough
to
make
all
of
the
discrete
cosine
transforms,
match
a
double
precision
floating
point
implementation
after
rounding
to
the
nearest
pixel
value.
So
with
just
a
four
bit
up
shift,
we
get
the
error
down
below
one
half
of
a
pixel
step
for
8-bit
input.
C
E
From
the
formula
giant
matrix
multiply
implementation,
that's
good!
What
about
what
so
the
error
winds
up
being
the
same
for
higher
improvements
and
go
to
the
next
slide.
E
I'll
talk
about
that
and
something
uncouple
sides
yeah,
that's
it.
Basically,
the
the
accuracy
is
less
important
for
higher
bit
depths,
because
what
you
actually
care
about
is
accuracy
relative
to
your
quantizer
and
so
higher
bit
depths
use
higher
quantizers
to
get
similar
bit
rates.
So
we
shift
up
less
for
higher
bit
depths
on
basically
10
bits
is
a
two
bit
shift
in
twelve
bits.
We
have
no
that
shifts,
so
it
injects
a
little
bit
more
noise,
but
it
doesn't
matter,
but
as
a
result,
we
can
use
the
same
transforms
for
all
bit
depths.
E
All
the
input-
that's
that's,
that's
correct!
You
can
use
the
exact
same
implementation.
That's
nice!
Alright
go
back
yeah
next
slide.
So
how
does
this
compare
with
vp9?
So
vp9
also
shifts
up
the
inputs,
but
by
not
as
many
as
four
bits,
and
then
it
shifts
down
the
outputs
by
more
than
four
bits
and
actually
has
to
do
it
sometimes
in
between
row
and
column
transforms
too
and
that's
because
they
have
this
extra
factor
of
a
square
root
of
two
that
that
they
grow
by
every
every
transform
size.
E
So
what's
actually
happening
is,
is
the
scale
these
vp9
coefficients
grows
as
the
transform
progresses.
So
any
rounding
errors
that
you
introduced
early
in
the
process
get
magnified
as
that
scaling
increases,
whereas
in
dala
all
the
stages
have
the
same
scale.
So
all
of
the
rounding
errors
are
injected
at
the
same
level
and
they
do
accumulate,
but
we
don't
magnify
them
all
right.
Next
slide,
that's
the
one
we
just
did.
E
So
another
important
point
to
talk
about
is
the
difference
between
scaling
and
dynamic
range,
so
everything
here
has
has
orthonormal
or
unitary
scaling
right.
So
the
magnitude
of
the
basis
functions
is
1.0,
but
the
dynamic
range
of
the
output
still
increases
so
the
dynamic
range
here
I
mean
the
minimum
or
maximum
output
values
you
can
actually
have.
E
So
all
of
your
unitary
transforms
are
essentially
n,
dimensional
rotations
and
you
can
think
of
the
input
as
a
big
n-dimensional
box
and
the
length
of
the
diagonal
of
that
box
is
going
to
be
longer
than
the
length
of
any
of
the
edges.
So
as
you
rotate
it,
you
can
get
larger
values
than
you
started
with.
In
fact,
it's
by
a
factor
of
square
root
of
2
every
time
and
doubles,
which
you
know
is
in
addition
to
the
scaling
that
vp9
does
and
in
it
they're
not
the
same
scaling.
E
So
the
question
you
might
ask
is:
how
big
can
the
outputs
actually
be?
Next
slide
so
with
a
4-bit
upshift,
all
the
transforms
with
64
pixels
or
less
fit
in
16
bits.
So
that's
a
9
bit
residual
4
bit
up
shift
and
then
3
bits
of
dynamic
range
expansion
which
is
half
a
bit
for
each
of
the
powers
of
2
and
64.
So
that
includes
your
four
by
four
or
four.
E
Eight
eight
by
four
eight
by
eight
four
by
sixteen
and
sixteen
by
for
all
of
the
column,
transforms
all
the
way
up
to
64
point
also
fit
in
sixteen
bits.
So
that
means
that
that
16
bits
is
the
maximum
size
that
you
need
for
a
hardware
transpose
buffer.
So
in
between
row
and
column
stages,
the
hardware
has
to
buffer
the
coefficients,
so
it
can
transpose
them,
which
is
a
fairly
significant
gate,
cost
so
being
able
to
keep
that
small
as
nice.
E
It
also
means
that
when
you're
writing
Cindy,
you
can
write
a
Cindy
for
the
row
transforms
and
it
used
to
be
simply
for
the
column
transforms
and
it
all
fits
in
16
bits
and
for
all
sizes.
And
then
you
can
have
a
separate
version.
Once
things
start
going
large
at
the
16
bits
so
comparing
to
vp9,
they
have
larger
intermediaries
in
the
transforms,
but
they
always
shift
their
final
coefficients
down
to
fit
in
16bits.
E
So
we
think
this
is
a
Miss
optimization.
It's
it's
actually
just
as
easy
to
do
this
shift
down
and
pack,
while
you're
doing
quantization.
So
we
we
have
not
tried
to
do
this
extra
shift
at
the
end.
It
also
helps
avoid
double
rounding
and
and
simplifies
rate
distortion.
Optimizations
is
you
don't
have
to
have
any
special
cases
for
different
scale
factors
depending
on
your
block
size,
all
right
slide,
a.
C
C
E
E
But
yeah
I
mean
that
the
point
is
you're.
Gonna
have
to
go.
You're
gonna
have
to
go
up
to
32
bits
in
the
transforms
at
some
stage,
because
we've
eliminated
this
extra
scaling.
We
do
that
at
a
later
stage
than
vp9
does,
and
also
because
we
don't
do
extra
of
shifting
for
high
bit
depth.
We
do
it
at
a
later
stage
in
the
vp9
dose.
So
we
can
keep
you
in
16
bits
longer,
but
yeah
I
mean
at
some
point.
E
E
Alright,
so
a
few
notes
on
reversibility.
So
when
you
have
steps
of
this
general
form,
where
you
take
a
variable-
and
you
add
to
that
variable
sum
function
on
all
the
variables
except
the
one
you're
adding
to
that's
called
a
lifting
step,
there
can
be
an
whirring
like
that.
That
function
could
be
arbitrary.
E
What
that
means
is
we
can
make
inverse
transform
by
just
reversing
all
the
steps
of
our
forward
transform,
and
so
it
turns
out
that
all
of
the
steps
that
I
have
described
so
far
that
we
use
to
build
our
transforms
happen
to
be
lifting
steps
all
right.
So
so
why
is
this
good?
Why
would
you
want
to
do
this?
E
So
we
really
wanted
reversibility
in
dala,
because
we
used
lapping
instead
of
a
deblocking
filter,
so
do
blocking
filters.
Have
this
nice
property
that
they're
low-pass
on
so
they
tend
to
blur
out
details
over
consecutive
frames,
whereas
on
the
other
hand,
forward
and
inverse
lapping
are
matched
so
any
any
details
that
you
have
do
not
get
blurred
out
by
by
applying
the
the
lapping
filter.
They
instead
just
get
shifted
around
and
when
you
apply
the
opposite
of
the
lapping
filter,
then
they
get
restored.
E
So
if
those
two
are
not
exactly
matched,
then
you'll
build
up.
These
rounding
errors
over
multiple
frames-
and
this
is
the
same
problem
of
you
know.
We
essentially
have
an
unstable
filter
so
because
we
have
an
exact
integer
specification
of
our
transforms
on,
you
know
there
you
would
never
get
encode
or
decode
or
mismatch,
but
it
would
cost
bits
to
correct
these
rounding
errors
in
the
encoder.
So
that
was
bad.
All
right
next
slide.
Do
we
actually
need
perfect
reversibility,
so
it
seems
to
help
compared
to
transforms
that
don't
have
it.
E
We've
seen
some
small
coding
gain
improvements,
but
it's
probably
not
required,
but
we
get
it
basically
for
free
from
from
the
structure
of
our
design.
We
don't
actually
have
it
in
dal
anymore.
So
when
you
do
this,
4-bit
up
shift
and
then
do
the
transform
and
into
the
4-bit
down
shift
that
down
shift
is
not
reversible,
so
that
breaks
it.
E
You
can
restore
it
by
using
twelve
that
references,
even
if
you
have
eight
put
input
data,
basically
just
avoiding
the
down
shift
down
to
two
by
four
bits.
At
the
end
and
there's
a
nice
blog
post
there
by
Monty
that
that
goes
through
and
shows
you
what
the
this
error
buildup
looks
like
and
what
happens
when
you
switch
to
twelve
the
references
and
it
essentially
goes
away,
but
it
turned
out
also
that
just
using
CL
PF
from
Thor
or
the
da
lady
ringing
filter
solves
the
problem
by
adding.
E
Essentially
one
of
these
low-pass
filters
back
that
that
we
didn't
have
an
art,
deblocking
filter,
so
that
prevents
these
errors
from
building
up
right
next
slide.
Moses.
C
E
E
E
Actually
something
we
tried
back
in
vp9
with
an
early
version
of
these
transforms
and
I
think
just
replacing
the
the
four
point:
Walsh
Hadamard
transform
that
they
use
with
a
four
point.
Dct
was
about
25%,
worse
in
terms
of
the
lossless
bitrate,
so
I
don't
know
if,
if
you
allowed
using
larger
transform
sizes,
if
you
you
know,
instead
of
just
fixing
everything
down
an
adaptive.
C
E
E
Alright
Exide,
so
the
other
other
nice
feature
of
reversibility
is
the
effect
it
has
on
dynamic
range
right.
So,
as
we
said,
the
transform
coefficient
values
are
larger
than
your
pixel
values,
because
your
forward
transform
expands
the
dynamic
range.
Your
inverse
transform
is
also
an
n-dimensional
rotation.
So
how
do
we
know
that
it
doesn't
expand
dynamic
range
right
like
if
I
have
two
coefficients
X
0
and
X
1,
and
they
both
just
barely
fit
in
16
bits?
E
So
this
means
that
I'm
only
guaranteed
to
avoid
overflows
if
the
coefficients
come
as
the
result
of
transforming
pixels.
So
if
I
decode
random,
garbage
I
might
get
random
overflows,
but
we
can
just
define
that
that
you
know
those
cases
aren't
our
undefined
behavior
right.
We
don't
I,
don't
think
anybody
actually
cares
about
the
quality
of
decoding
random
garbage.
E
That's
that's
the
same
approach.
264
took
so
one
note
about
discrete
sine
transforms.
There
are
two
types
that
we
care
about:
type
4
and
type
7.
So
for,
inter
predictions
residuals,
the
the
prediction
error
you
get
is
asymmetric,
so
the
error
close
to
the
edges,
you're
predicting
from
is
much
smaller
than
the
error.
Far
away
from
those
edges,
which
means
you
want
an
asymmetric
transform
to
code
them.
E
Then,
if
you
say,
ok,
what's
the
optimal
transform
to
use
it
winds
up
being
this
type
7
DST
and
get
that
by
taking
the
a
linearly
increasing
correlation,
metrics
and
and
taking
the
limit
as
the
correlation
approaches
one
and
solving
the
eigen
system
and
say
what
do
you
get
the
type
seven
DST
pops
out
so
type?
Seven
DST
factorizations
are
much
nastier
than
the
type
fours,
which
are
the
ones
that
we
have
embedded
inside
of
our
DCT.
E
So
the
type
4
is
there
at
the
top
and
the
type
7.
Is
this
thing
down
here
and
the
real
problem?
Is
this
n
plus
1/2
thing
inside
your
trig
functions,
which
means
what
this
actually
is
is?
Is
a
trig
transform,
embedded
inside
of
a
2n,
plus
1,
sighs,
fast,
Fourier,
transform
and
so
pulling
that
out
of
there
and
still
retaining
a
fast
algorithm
is
a
bit
Messier
since
it's
not
a
power
of
two.
So.
E
Next
slide
type
four
transforms
turned
out
to
be
almost
as
good
in.
There
are
already
embedded
inside
of
all
of
our
DC
T's,
but
our
current
approach
is
that
we
use
type
sevens
for
the
very
small
ones,
currently
only
four
point:
eight
eight
fight
and
then
use
the
embedded
type
force
for
all
of
the
larger
D
STS.
E
So
we
actually
can
wind
up
with
with
39
percent
fewer
applause.
I.
Think
for
the
32-point
DST,
we
actually
implemented
the
the
Cindy
for
the
eight-point
DCT
and
directly
compared
that
to
the
existing
Cindy
for
for
the
Avon
transforms,
and
it
was
benchmarked
at
26.2
percent
faster
and
that's
mostly
result
of
using
using
fewer
multiplies
and
using
cheaper
multiplies
right.
E
So
a
few
hardware
considerations
inter
prediction
requires
reconstructed
pixels
from
your
neighboring
blocks,
so
you
think
about
it.
This
serializes
the
reconstruction
of
those
blocks,
including
the
inverse,
transform
part
of
that
reconstruction,
which
is
a
particular
problem
for
encoders
and
the
decoders.
You
can
sort
of
start
the
transforms
early
and
it
only
serializes
adding
the
residuals,
but
on
the
encoder
side
you
need
to
know
what
pixels
to
transform
so
that
that
part
becomes
completely
serial.
E
Unfortunately,
when
we
do
our
3
multiply
rotations,
we,
those
multiplies,
are
all
changed
consecutively
like
each
one
depends
on
the
output
of
the
previous
one,
which
winds
up
being
a
bottleneck
for
small
transform
sizes
for
hardware.
Alright
next
slide,
so
just
for
the
4-point
DCT
and
DST
we've
replaced
them
with
transforms
that
are
not
perfectly
reversible
and
not
lifting
based,
but
we
basically
replace
the
three
block.
Three
multiply
block
with
a
four
multiplier.
It's
just
like
the
matrix
multiply.
E
E
E
So
we
can
replace
a
bunch
of
the
serial
multiplies
in
our
rotations
with
these
parallel
multiplies
without
introducing
any
additional
multiplies
and
so
anything
that
anything
that's
of
the
form
you
know
x0
plus
a
times
X
1.
U
0,
plus
B
times
u
0
and
then
y1
plus
a
times
x1
ruies
have
this
ABA
structure
for
the
constants.
We
can
replace
with
this
little
more
gnarly
looking
thing
on
the
right,
but
if
you
reduce
it
down,
it's
1
addition
3
multiplies
that
all
happen
in
parallel
and
then
2
more
additions.
E
So
it's
the
same
number
of
operations,
but
the
multiplies
can
happen
in
parallel,
so
this
is
again
no
longer
exactly
reversible.
So
we're
still
experimenting
to
see
what
impact
that
has
on
accuracy
and
making
sure
it
doesn't
introduce
any
new
potential
overflows
that
would
prevent
us
from
from
keeping
our
17
16
bits.
E
C
And
Florida
had
one
more
final
question
kind
of
a
broad
one.
So
these
look
like
they
compare
these
transforms.
Look
quite
they
compare
very
favorably
to
vp9
and
av1.
Have
you
looked
at
Thor,
which
is
basically
a
chibi?
See
if
you
look,
the
comparisons
to
the
Thor
transforms,
so
the
HTPC
transforms
so.
E
So
we
haven't
done
direct
comparisons,
at
least
in
terms
of,
for
example,
coding
performance
in
terms
of
complexity
like
I.
E
They,
if
I
understand
correctly,
the
Thor
storms
are
basically
giant
matrix,
multiplies,
and-
and
so
you
know
that
you
can
get
away
with
that
for
very
small
transforms,
but
as
they
get
much
larger,
I
think
that
this
will
wind
up
being
significantly
faster.
C
C
I
I
can
see
them,
though
so
yeah
I'm
gonna
present
an
update
to
the
CFL
Draft
for
VC.
So
if
we
go
to
the
first
slide,
chroma
from
luma
is
essentially
an
intra
prediction
tool,
so
it
has
no
dependencies
on
other
frames.
It
is
only
available
to
chroma
planes
and
it
basically
works
by
predicting
chroma
pixels
using
coincident
reconstructed,
luma
pixels.
So
let's
go
to
the
next
slide
to
see
the
difference
from
what
we
proposed
before
so
prior
proposal
was
on
a
dowel
implementation.
I
So
now
we've
changed
that
to
reflect
what
was
proposed
for
a
v1,
most
significant
changes
that
we
no
longer
rely
on
pvq.
So
prediction
is
now
done
in
the
spatial
domain.
We
consider
the
only
the
AC
contribution
of
reconstructed,
pixels
I'll
talk
about
that
a
bit
later,
but
that
is
similar
to
what
was
happening
before
in
the
pvq
version
of
CFL.
We
use
the
existing
DC
pred,
so
DC
prediction
for
the
chroma
DC
contribution.
This
is
already
available
in
a
v1
there's
already
fast
implementations.
I
It
requires
no
signaling
and
it
is
more
precise
than
what
is
always
used
before.
So
that's
also
very
interesting.
So
going
on
to
the
next
slide,
so
the
differences
we
can
talk
about,
maybe
Dola
and
Thor,
which
are
you
know,
codecs
that
people
know
here.
I
already
said
before
we
went
away
from
frequency
and
we're
now
going
for
the
spatial
domain.
For
prediction
the
Thor
implementation
is
implied
in
the
signaling.
I
The
doubt
implementation
use
the
pvq
gain
and
the
sign
bit
to
send
the
information
we
send
the
information
explicitly
using
joint
signs
and
an
index
value.
The
activation
mechanism
was
a
threshold
for
Thor.
It
was
also
signaled
in
doubt
we
have
a
special
UV
only
mode
in
a
v1,
so
anyone
has
separate
prediction
modes
for
intra
and
intra
luma
and
intra
chroma.
So
we
take
advantage
of
that
to
have
this
UV
only
mode
called
CFL
pred.
I
We
do
encoder
instead
of
doing
encoder
model
fitting.
We
will
do
a
rate
constraint,
search
and
we
do
know
a
decoder
model
fitting
since
the
information
in
signal
in
the
bits
tree
moving
on
to
the
flow
of
the
operations.
We
see
that
if
subsampling
is
used
as
though
if
chroma
subsampling
is
used
well,
the
luma
surface
will
not
be
the
same
as
the
chroma
surface.
So
we
must
do
a
luma
subsampling
that
is
equivalent
to
the
chroma
subsampling.
That's
being
done,
we
subtract
away
the
average.
I
Then
we'll
decode,
the
signals
scaling
factors
from
the
bit
stream
and
we'll
multiply
that
these
are
in
q3
precision,
but
then
once
we
multiply,
that
goes
down
to
q0
and
we
add
in
the
DC
pred
the
chroma
DC
pred
to
that
value,
and
that
gives
us
our
final
prediction.
So
if
we
look
at
the
codebook
that
we
end
up
with
on
the
next
slide-
oh
okay,
nevermind!
Oh
that's
good!
Okay!
So
alright!
So
basically,
why
do
we
go
with
the
chroma?
Dc
pred?
I
So
we
don't
have
to
signal
the
beta
value,
so
alpha
will
be
signaled,
but
beta
won't
moving
on
to
the
next
slide.
We
have
the
scaling
code
book.
So
basically,
this
shows
you
when
we
do
the
search,
what
happens
so
we
start
in
the
middle
of
this
grid
and
we
can
change
the
scaling
factor
for
chroma
correction
with
a
chroma
CR
and
chroma
C
B,
and
we
move
from
negative
to
positive
and
you
can
see
all
the
different
tones
that
you
can
get.
This,
of
course,
is
only
a
subset
of
the
codebook
we
have.
I
It
is
goes
from
minus
2
to
2
in
q3
so
that
it
goes
up
in
steps
of
1
8
0
0
is
not
allowed,
as
it
is
DC
pred.
We
pick
our
value
using
a
rate
constraint,
search
as
I
said
before,
since
we
are
signalling
the
Alpha
value
the.
When
we
do,
we
can't
use
a
liner
regression
because
that
value
won't
be
our
D
optimal.
So
what
we
do
instead
is
we
do
the
same
thing
as
any
other
parameter
in
the
encoder.
I
That
requires
rate
is
that
we
take
the
weighted
rate
and
add
that
to
the
distortion
value
and
pick
the
the
parameter
that
minimizes
that
and
that
gets
signal
to
do
decoder.
The
next
slide
will
explain
how
we
go
about
signaling
lists,
so
we
will
join
both
times.
So
there's
gonna
be
2
scaling
parameters,
one
for
CRN,
one
for
CB,
so
we
joined
them
together.
I
A
sign
can
either
be
0,
negative
or
positive,
and
since
0
0
isn't
allowed
because
that's
DC
pred,
we
have
eight
values
which
we
sent
to
our
multi
symbol,
encoder
as
an
eight
value
symbol.
Now,
for
each
non
zero
scaling
factor,
we
will
send
a
value,
excluding
zero,
but
all
the
way
to
2
inclusively
and
this
again
with
a
step
of
1/8.
This
gives
us
16
values
for
our
multi
symbol
and
this
actually
maxes
out
what
Multi
symbol,
entropy
coding
can
give
us,
which
is
a
16
value
CDF
going
on
to
the
next
slide.
I
We
can
see
results
from
our
analyzer
there's
a
link.
You
can
click
there.
Sadly,
it
got
moved
behind
the
image.
You
can
see
the
distribution
of
how
many
times
modes
get
used.
So
these
are
UV
modes
that
in
a
v1
we
can
see
that
there's
about
44%
of
the
time
DC
prêt
will
get
picked,
but
a
CFL
comes
in
at
about
17%.
We
observe
it
between
15
and
20%
in
different
sequences
for
a
v1,
as
you
can
see
the
other
contender
modes
or
best
motors
still
slightly
below.
I
So
we
see
that
actually
performs
other
chroma
modes
that
are
available
in
the
encoder
and
you
can
actually
see
this
live
in
the
analyzer
in
real
time.
Moving
on
to
the
results
for
subset
one,
we
can
see
that
there
is
a
minus
4.65
CIE
de
2010
tidge.
It
is
the
bt
right,
so
it
gives
us
a
rate
decrease
with
the
same
level
of
quality.
We
use
the
CIE
P
D
value,
because
it
is
the
only
one
that
considers
both
luma
and
chroma
and
does
so
in
a
perceptually,
uniform
white.
I
If
you
click
on
the
links
below,
you
can
see
the
full
breakdown
with
all
the
values
so
subset,
one
are
still
images
and
objective
one
fast,
our
video
sequences,
as
you
can
see.
In
that
point,
we
are
giving
about
an
on
average
two
point:
forty
one
percent
reduction.
This
is
for
a
single
tool,
CFL
overall
of
81.
I
So
that's
pretty
interesting.
There
is
also
psnr
games.
These
are
illumise
and
our
gains.
The
reason
for
that
is,
since
we
have
better
predictions,
we
actually
reduce
the
amount
of
bits
so,
and
that
gives
us
this
metric
actually
gives
better
gains
because
it
has
same
level
quality,
but
it'll
have
fewer
bits
to
do
so,
and
since
this
is
the
area
between
a
rate,
difference
and
quality
that
I'll
give
you
a
negative
value.
So
that's
very
good.
If
we
move
on
to
the
next
slide,
we
see
that
it
is
actually
very
good
for
screen
content
coding.
I
So
here
we
we
have,
on
average
about
5%
reduction
for
the
screen
for
the
gaming
twitch
data
set,
which
is
on
slide.
11,
yes,
so
notable
mentions
here
are
the
minecraft
sequence,
so
CFL
alone
gives
a
minus
20%
reduction
on
both
minecraft
sequences
that
are
in
that
test
sets,
and
we
see
also
good
results
for
GTA
and
star
graph
at
about
5%
each
first
tiede
2000
and
you
know,
still
some
significant
gains
for
psnr
luma.
C
Thank
you
very
much
thanks
for
coming
when
sitting
for
a
Tim
to
time,
usurper
any
other
questions
for
for
Luke
on
the
chroma
formula
tool,
any
other
final
items
off
the
agenda
all
right
so
make
sure
that
I
get
the
blue
sheets
signed.
If
you
came
in
late,
anyone
still
here
from
net
VC
is
make
sure
to
get
them.
Where
is
the
blue
sheet
by
the
way
anybody
needs
it
right?
Please
raise
your
hand
we'll
get
it
to
you.
Otherwise,
thank
you
very
much.
I'll
see
you
101.