►
From YouTube: IETF99-NETVC-20170717-1550
Description
NETVC meeting session at IETF99
2017/07/17 1550
https://datatracker.ietf.org/meeting/99/proceedings/
A
Hello,
hello,
hello
that
works
right,
okay,
good,
so
welcome
everyone
to
Internet
video
codec
or
video
codec
net
BC.
This
is
our
any
session
this
week.
So
we
have.
We
have
a
relatively
packed
agenda,
but
let's
go
through
our
introduction
slides.
So
we
need
to
jabber
jabber,
scribe
or
missive
jabber
subscribe.
Our
jabber
room
is
net
vc
@
jabber
duty
after
org.
Do
we
have
a
Jabba
scribe?
It's
an
easy
job.
Thank
you
very
much,
sir
Thank
You
Jonathan.
For
for
for
nominating
yourself
to
be
jabbar
scribe.
We
also
need
a
note-taker.
A
I'll
pay
the
jeopardy
music
in
a
minute,
don't
make
me
sing
yay.
Thank
you!
So
much
sorry,
what's
your
name
Tessa!
Thank
you
very
much,
so
thank
you.
Tessa,
as
the
test
would
be
taking
taking
notes.
However,
you
wish
Tessa
it's
up
to
you.
Blue
sheets
are
going
round
as
per
normal.
We
will
have
actually
do
we
haven't.
Do
we
have
some?
A
We
do
have
one
remote
presentation
today,
right,
oh
okay,
so
please
say
your
name
clearly
loudly
at
the
mic,
so
that
Tessa
can
get
it
down
on
the
notes
properly
and
that
the
remote
presenters
will
know
who
is
speaking
I
will
try
and
speak
slower
as
well.
Okay,
next
slide!
Thank
you
note.
Well,
people
should
be.
She
should
be
quite
familiar
with
the
note.
A
A
B
B
B
How's
everyone
enjoying
this
on
the
remote
side,
periodic
boons
of
okay.
Now
that
everybody's
muted
their
headphones,
the
the
requirements
document
is
basically
ready
for
progressing,
so
we're
at
version
six
right
now,
and
there
was
a
few
changes
in
that
version.
It
was
mostly
around
section
three
one,
one
calling
out
that
that
the
objectives
for
compression
efficiency
are
really
applied
to
all
of
the
use
cases
that
are
defined
it
earlier
in
section
two,
so
before
it
was
trying
to
call
out
some
specific
use.
B
Cases
like
natural
content,
as
well
as
screen
sharing
content
and
rather
than
you
know,
enumerate
better
to
go
ahead
and
reference
section
two
which
has
all
of
the
use
cases.
So
the
compression
efficiency
targets
apply
to
all
of
those
use
cases
and
there
were
no
other
substantive
changes
so
where
we
are
with
the
document
right
now
is
we
completed
the
work
group
last
call
after
after
this
update
the
O
six
update,
that
was
back
at
the
end
of
May
and
the
current
status?
B
E
D
Alright,
that
better,
yes,
I,
can
hear
that
now
all
right,
so
I've
looked
at
these
slides
for
about
30
seconds
and
didn't
make
them.
So
this
should
be
great.
D
D
So
it's
a
suggestive
testing
procedure
defined
basically
using
the
same
codec
and
command
line
configuration
as
all
the
objective
tests,
but
we
only
we
only
select
one
quantizer
for
both
high
and
low
latency
to
test
to
test
visually
the
added
a
subjective
test
set,
which
is
basically
just
a
small
subset
of
the
full
objective
test
set
because
we
have
to
actually
manually
look
at
these,
so
smaller
is
better
and
we
have
implemented
a
tool
in
our
we
compressed
yet
and
the
analyzer
which
supports
subjective
testing.
D
So
basically
it
gives
you
a
split
view
or
lets
you
flip
back
and
forth
between
the
videos
and
then
you,
it
randomizes
the
presentation
order
and
you
vote,
for
which
one
you
think
looks
best
or
if,
if
there's
a
tie
and
there's
instructions
that
it
shows
the
the
voters
for
all
that
right
slide.
D
So
the
statistical
analysis
side
generally,
you
need
about
twelve
viewers
to
get
results
that
are
significant.
That
doesn't
mean
it's
not
a
guarantee
that
if
you
have
12
you'll
have
significant
results,
but
you
need
probably
at
least
that
many,
so
the
the
all
the
voting
is
is
prefer
a
or
prefer
B.
So
there's
no
indication
of
how
strong
that
preferences,
and
then
we
test
for
significance
using
assigned
tests
for
all
the
people
who
vote
hi.
D
It
basically
counts
as
half
of
up
four
and
half
a
vote
against
or
half
a
vote
for
a
or
half
a
vote
for
B.
However,
you
want
to
think
about
it
with
the
the
main
effect
is,
as
you
get
lots
of
ties,
it
becomes
much
harder
to
have
a
significant
result
and
then
all
p-values
under
point
zero.
Five
are
considered
significant.
D
All
right
so
subjective
one
is
a
test
set,
is
a
subset
of
objective
to
slow,
and
these
are
the
five
videos
in
there.
D
It
still
looks
like
a
blocky
mess
so
that
that's
something
that
will
probably
change
in
the
future
next
slide,
and
so
we
have
a
few
examples
of
some
of
the
tests
that
we've
done
so
one
of
them
was
was
a
test
against
the
current
constrain,
directional
enhancement
filter
versus
the
the
old
constrained
low-pass
filter
that
was
in
Thor.
So
Stan
was
going
to
talk
a
little
bit
more
detail
about
about
those
during
his
presentation.
D
But
those
are
a
couple
of
links.
You
can
go.
Click
on
or
even
type
in
this
they're
short
enough,
so
you
sort
of
get
an
example
of
the
the
kind
of
test
that
we've
running.
So
these
have
all
completed
at
this
point.
So,
while
you're
welcome
to
go
ahead
and
vote
on
things,
we've
already
tell
it
the
results
there.
If
you're
curious,
see
deaf
wound
up
being
significantly
better
than
Co
PF
in
in
at
least
for
several
of
the
videos
that
we
tested
at
a
statistically
significant
level.
D
D
B
Think,
that's!
That's
it
so
about
if
you
want
to
I'll
jump
to
an
example.
This
is
aa.
People
interested
I
have
one
load.
B
So
that's
what
this
objective
test
looks
like
in
the
interface
there's
a
little
tutorial.
You
guide
you
through
it
and
if
have
you
sees
when
you
begin
experience,
objective
testing
we'll
be
happy
to
start
forwarding
all
of
this
objective.
Testing
requests
to
the
list
and
people
can
start
evaluating
the
tools
that
we're
looking
at.
B
Yeah
this
the
resolution
on
here
is
killing
it.
This
is
nothing
designed
for
4k.
D
B
F
D
H
B
B
C
G
G
There's
no
no
in
the
github
repository
since
Chicago,
but
there's
still
some
work
that
has
been
done.
I
think
the
consensus
in
Chicago
was
that
we
should
aim
to
have
both
or
and
dialogue
converge,
and
so
that
would
include
merging
the
loop
filters.
The
dalla
Deering
and
ANSI
OPF
and
John
Mark
presented
how
we
did
that
for
a
v1
and
so
I
began
began
doing
that
for
Thor
as
well,
but
I
haven't
quite
finished
yet
and
also
we
should
a
Taurus
lacking
proper
entropy
coding.
So
that's
also
on
the
list.
G
G
So
the
original
see
that
design
had
directions,
filter
which
corresponds
to
the
first
dollar
Dearing
filter
and
then
across
filter
corresponding
to
thoraseal
DF
and
the
second
filter
is
applied
on
top
the
first
filter
that
gave
some
hardware
concerns
over
linebacker
requirements,
because
both
filters
can
have
vertical
filtering.
So
when
you
apply
it
on
top
of
each
other
than
the
line
buffer
requirements
increases.
So
that
was
originally
addressed
by
restricting
the
second
stage
filter
in
certain
cases,
but
I
think
that
was
really
a
quick
fix.
G
So
these
are
the
tabs.
The
first
eight
matrices
are
the
tabs
for
are
the
primary
tabs,
which
will
have
which
will
be
weighted
with
the
primary
strengths
and
the
lower
matrices
are
the
arrangement
of
these
secondary
tax,
which
will
have
a
separate
strength
and
in
the
single
pass
filter.
I
tried
both
this
set
of
taps
and
also
a
few
more
tabs
extending
the
upper
eight
matrices
to
7x7
field
so
that
there
were
two
extra
tabs,
but
that
didn't
change
the
objective
results.
So
this
is
what
I
am
currently
implementing
fourth
or
next
slide.
G
So
these
are
the
objective
results.
Comparing
the
two
paths
with
the
one
pass
filter,
so
negative
numbers
means
that
one
pass
is
better
and
in
luma,
P
star
is
about
0.2
percent,
better,
which
is
close
to
the
noise
range,
but
at
least
on
the
right
side
of
this
zero.
And
if
you
look
at
the
chrome
numbers,
they
are
better
and
in
particular,
if
you
look
at
the
CIE
de
numbers,
which
combine
luma
and
chroma,
we
get
about
half
a
percent,
which
I
think
is
it's
not
much.
But
it's
it's
nice
for
something.
G
The
tests
were
done
in
everyone,
but
again
I,
don't
think
that
will
be
much
different
from
what
we
would
see
in
thore
and
as
they
mentioned,
there
was
a
significant
preference
to
see
that
in
some
cases
were
in
the
low
latency
cases
for
the
high
delay,
our
latency
cases,
there
were
no
preference
but
still
see
deaf
got
more
votes
than
sealed.
Therefore,
every
sequence,
both
in
low
delay
and
high
delay,
the
numbers
burn
just
not
see
significant.
G
G
So
these
are
the
results,
for
all
results
are
red,
is
the
the
vote
counts
for
Co
path?
Gray,
are
the
tie,
counts
and
see?
Death
are
the
green
bar
and
in
all
cases
there
are
more
votes
for
see
death
and
in
two
cases
the
difference
is
significant,
and
this
is
for
low
latency.
If
you
move
on
to
high
latency
on
the
next
slide,
there's
no
significant
difference
but
again
see
that
has
more
votes
than
see
opf.
G
G
G
How
the
compression
and
complexity
trader
log
trade-offs
are
looking,
so
I
have
been
using
to
assess
the
the
regular
objecting.
One
fast
I
didn't
use
objective
too
fast,
because
from
some
time
to
time
the
objective
to
fast
tests
that
breaks
a
v1,
it
might
be-
have
been
fixed
now,
but
I
made
a
test
so
that
we
could
see
how
a
t1
has
been
doing
over
time.
So
then
I
needed
to
do
it
with
the
old
test
sets
and
also
I
selected
a
subset
of
objective
one
test,
which
is
just
we
did
a
video
conferencing
content.
G
G
I
also
run
I
also
ran
vp9
and
everyone
in
both
errors,
resilient
and
non
resilient
modes,
because
Thor
is
always
a
resilient.
So
in
order
to
do
a
proper
Apple
to
apples
test,
I
I
did
both
and
where
I
compare
the
different
codecs
I
use
for
in
high
complexity,
low
latency
mode
as
the
BDR
anchor
next
time.
G
A
G
So
if
you
look
at
the
complexity-
and
here
the
y
axis
is
logarithmic
and
all
it
also
shows
the
frames
per
minute
and
not
frames
per
second,
so
it
started
a
year
ago
with
about
twenty
three
frames
per
minute
and
the
lated
latest
code
will
run
the
same
sequences.
That's
1.9
frames
per
second
I'm,
sorry
per
minutes.
B
D
Yeah
Tim
terrier
from
Mozilla
I'm,
not
sure
exactly
which
commits
steiner,
measured,
but
one
possibility
is.
There
were
some
changes
to
to
select
which
reference
frames
to
use
for
each
block,
independently
of
searching
all
the
possible
coding
modes
for
that
block,
which
allowed
you
to
make
much
quicker
selections
of
which
reference
frames
to
use.
So
that
was
that
was
something
that
happened
after
we
expanded
from
three
reference
frames
to
six
reference
frames,
so
the
expansion
probably
made
it
much
slower
and
then
speeding
up
that
selection
may
have
made
it
faster
again.
G
So
over
the
last
three
or
four
months
there
have
been
a
lot
of
new
tools
being
added
to
the
code
base
and
and
enabled
by
default
and
I'm
sure
that
that
has
happened
without
the
optimizations
being
fully
done,
and
also
some
of
the
tools
compete
for
the
same
gains,
and
there
still
is
some
work
to
get
a
proper
integration.
So.
G
G
B
G
There
I
could
just
have
been
unlucky
in
picking
the
bat's
exactly
to
commit,
but
because
that
was
selected,
I
selected,
whatever
was
in
the
repository
on
the
1st
and
15th
of
every
month,
I
think,
but
it
does
show
a
trend
and
it
roughly
corresponds
to
the
compression.
So
basically,
higher
compression
comes
with
a
cost.
G
G
G
D
So
so
Tim
terrybear
again
you're,
probably
thinking
of
Wikipedia,
which
is
right,
a
screen
capture
of
somebody
scrolling
through
Wikipedia
article.
There
is
also
a
few
twitch
videos,
one
including
minecraft,
which
may
benefit
from
the
screen
coding
tools,
but
I,
don't
I,
don't
think
it
was
nearly
as
large
as
the
benefit
for
Wikipedia
right
and.
G
G
Will
add
some
more
complexity,
obviously
a
path,
but
it's
it's
not
that
huge
speaking
of
a
few
percents
of
running
time,
depending
on
the
complexity
setting
and
for
the
entropy
colder,
it
will
likely
add
some
complexity.
But
again
it's
not
doubling
or
something
like
that
and
oh
the
screen
content
tool
it
hasn't
been
invented.
Yet
so
it's
hard
to
tell
so.
G
B
D
There
are
a
few
but
they're
small,
okay,
all
right
so
I'm,
not
Thomas
daddy,
but
he
did
most
of
the
work
for
this,
so
his
name's
on
the
slide
so
I
basically
wanted
to
go
over.
This
change
was
just
something
that
we
discovered
while
working
with
the
vp9
r2b
specification
and
thought:
that's
not
great.
Maybe
we
could
make
that
better,
so
basically
had
a
couple
of
requirements
next
slide.
D
If
you
want
to
do
something
like
temporal
scalability,
you
know
you
should
be
possible
to
determine
and
control
which
previously
coated
frames
or
dependencies
of
the
current
frame
right.
So
if
I
have
a
bunch
of
layers
like
I
want
to
know,
when
can
I
actually
drop
a
frame
and
I
want
to
be
able
to
construct
the
layers
in
such
a
way
that
I
can
drop
frames
in
the
no
braking
evening.
D
D
Conversely,
if
I
want
to
have
error
resilience
and
it
should
be
possible
to
determine
if
the
decoder
is
missing,
a
frame,
that's
required
for
decoding
that
way,
I
can
ask
for
it
again
or
I
can
decide
to
drop
some
frames
and
that
lets
you
build
a
decoder
that
never
shows
a
broken
frame.
So
this
is
sort
of
like
the
previous
case,
but
instead
of
it
said
being
intentionally
deciding
which
frames
to
drop,
you
know
sometimes
I
just
won't
get
a
frame
and
then
I
have
to
figure
out
how
to
handle
it.
Alright
next
slide.
D
So
let's
talk
about
how
this
works
for
vp9,
so
there
there
are
a
bunch
of
reference
frame
dependencies.
So
basically,
you're
allowed
up
to
three
reference
frames.
Each
frame
can
reference
up
to
three
different
frames
out
of
a
pool
of
eight
that
the
decoder
maintains,
and
these
are
implicitly
or
explicitly
signaled
with
picture
IDs
in
the
RTP
mapping.
So
the
implicit
version
is
basically,
you
just
set
up
a
pattern
that
gets
used
over
the
whole
group
of
pictures
and
the
explicit
version
just
in
the
frame
header.
D
It
has
a
list
of
up
to
three
picture.
Ids
and
those
are
the
ones
you
reference,
but
then
there's
this
other
set
of
dependencies,
which
it
come
from.
What
vp9
calls
frame
context?
D
What
these
basically
are
our
probabilities
used
for
the
entropy
coding,
so
vp9
stores
probabilities
that
are
backwards,
adapted
based
on
data
from
previous
frames,
and
the
decoder
maintains
four
independent
sets
of
these
probabilities
and
then
each
frame
signals
which
one
it
wants
to
use
and
can
optionally
write
back
to
that
that
same
set
on
the
updates,
based
on
the
data
that
was
decoded
from
the
current
frame.
So
this
choice
is
completely
uncorrelated
with
your
reference
pictures
or
picture
IDs
or
any
of
that
other
stuff.
D
So
next
slide,
you
can
imagine
this.
This
creates
some
problems,
so
if
you
lose
a
frame
in
the
air
resiliency
case,
you
don't
know
which
slot
it
updated,
so
you'd
actually
no
longer
know
if
you
could
decode
any
frame,
but
also
the
last
frame
to
update
the
slot
using
might
not
have
been
one
of
your
reference
frames.
D
So
if
you're,
going
through
your
your
your
RTP,
headers
and
saying
okay,
you
know
do
I,
have
all
the
the
frames
I
need
to
be
able
to
decode
this
for
the
current
layer
or
can
I
safely
drop
this
frame
and
not
break
anybody
else.
You
don't
actually
know
unless
you
parse
into
the
packet
and
figure
out
which
of
these
these
frame
context
slots,
it's
updating
and
what
other
frames
that
effects.
D
So
there
are.
There
are
a
couple
of
ways
that
we
could
handle
this,
but
basically
what's
happening
here.
Is
we've
introduced
this
potential
hidden,
fourth
frame
dependency
and
for
people
who
are
designing
RTP
mappings?
This
is
surprising
because
everybody
thinks
oh
I
told
you
what
reference
frame
to
use.
That's
all
you
needed
to
know
right,
but
actually
there's
this
extra
mapping
and
the
art
extra
dependency
in
the
RTP
mapping
only
signals
three
picture
IDs.
So
there's
a
couple
of
ways
we
could
fix
this.
We
could
signal
a
fourth
picture
ID.
D
D
We
could
do
better
and
then
the
final
problem
is,
is
you
can't
fork
probabilities
and
involve
them
independently
because
of
the
requirement
that
you
can
only
write
back
to
the
slot
you
read
from
so
so,
if
you,
basically,
what
that
means
is
that
every
layer,
if
it
wants
to
have
its
own
independent
set
of
probabilities,
it
has
to
pay
the
cost
of
adapting
them
from
the
static
defaults
independently
of
all
the
other
layers.
So
you
don't
get
to
share
any
of
that
overhead
next
slide.
D
So
we've
made
a
proposal
for
a
v1,
but
the
problems,
a
v1
basically
has
all
these
same
problems
and
then
then
more
problems.
On
top
of
that,
so
one
thing
one
change
that
everyone
did
make
is
that
it
now
explicitly
signals
the
frame
IDs
in
the
codec
payload
instead
of
having
in
the
RTP
header,
so
that
that's
actually
good.
That
means
it
gets
consistently
done
the
same
way
everywhere.
It
now
allows
up
to
six
reference
frames
per
frame
still
draw
from
a
pool
of
eight.
D
D
Vector
prediction
in
the
original
design
was
that,
if
just
always
toad
that
always
picked
them
from
the
last
coded
frame,
and
if
you
were,
you
were
coding
things
in
such
a
way
that
the
last
coated
frame
wasn't
going
to
be
available
or
you
didn't
want
to
rely
on
it
being
available.
Then
you
just
didn't:
have
temporal
motion
vector
prediction?
Sorry,
you
couldn't
use
it
so
that
was
sort
of
fixed
up
by
this.
B
I'm,
mostly
from
the
for
Mike
I,
just
took
comment
on
the
first
one
for
resilience.
There's
also
I'm,
not
sure
what
you
meant
by
frame
IDs
there,
but
for
resilience.
We
also
have
these
frame
numbers
now
that
have
been
added
that
are
beyond
just
you
know,
which
one
of
eight
you
can
actually
have
a
much
larger
frame
number
so,
like
you
know,
a
10
bit
12
bit
frame
number
that
way.
If
you
drop
one,
you
actually
know
that
you
dropped
one.
You
know
that
yeah.
D
Yeah
yeah,
that's
that's
what
I
meant
so
basically,
this
the
same
as
the
picture
IDs
in
the
vp9
are
rqp
mapping
right,
I,
think
they're,
not
necessarily
the
same
number
of
bits,
but
but
similar
idea.
Okay,.
D
F
E
D
So
so
this
is,
this
is
basically
the
situation.
Now
you
have
this
pool
at
the
top
of
reference
frames.
Each
one
of
them
has
a
buffer
of
actual
pixels
in
it
and,
as
I
said
before,
we
have.
We
have
these
temporal
motion
vectors
that
that
get
saved
for
use
of
for
motion,
vector
prediction
and
future
frames
and
with
the
temp
MB
signaling,
what
they
did
is
they
just
move
that
buffer
into
the
reference
frame
buffer?
D
So
every
reference
frame
has
its
has
a
copy
of
the
motion,
vectors
that
were
decoded
with
that
reference
frame,
and
then,
when
you,
you
pick
your
list
of
references
to
use
for
the
current
frame,
then
the
first
one
becomes
the
one
that
you
use,
that
you
draw
those
those
motion
vectors
from,
but
then
down
at
the
bottom.
Here
there
are
these:
these
frame
contact
slots
that
have
all
the
probabilities
in
them,
and
you
know
those
are
you
point
to
some
some
index
in
that
table?
D
No,
that's
just
coded
in
the
header
completely
independently
of
all
the
reference
frames,
and
then
you
have
this
global
motion
data,
which
is
just
always
taken
from
the
previous
frame,
and
if
you
don't
want
to
use
the
previous
frame,
then
I'm,
sorry,
you
don't
get
to
use
global
motion
all
right,
and
so
with
our
proposal,
it
looks
more
like
this.
So
basically
we
move
all
the
probabilities
up
into
the
reference
frames
as
well,
and
also
the
global
motion
data,
though
this
diagram
yeah.
D
D
So
now
what
happens
is
is
whatever
is
the
first
frame
in
your
list
of
reference
frames.
You
now
draw
not
not
only
the
the
reference
pixels,
but
also
all
of
your
motion,
vectors.
All
of
your
probabilities,
your
your
reference
global
motion,
data
that
you
predict
from
so
everything
just
comes
out
of
that
that
first
slot
you're
pointing
to
John
thanks.
D
Why
did
you
make
first
slot
rather
than
the
selectable
slot,
because
I
wanna
pay
the
bits
for
the
selection
cost
I
don't
want
to
pay
the
bits
for
the
selection,
cost
and
and
coding
fewer
things
in
the
header
is
generally
good
from
from
an
IPR
perspective.
C
D
D
So
basically,
we
we
remove
all
the
frame
contact
slots
and
those
just
are
now
reference
frame
slots,
remove
all
the
syntax
elements
for
saving
and
restoring
frame
context.
So
it's
actually
more
than
three,
because
I
also
had
to
code
a
bit
to
say
whether
I
wanted
to
right
back
to
that
buffer
and
instead
we
always
save
a
frame
context
with
a
reference
buffer.
So
so,
whenever
we
would
store
a
reference
frame
into
one
of
those
slots,
we
also
store
the
updated
probabilities
from
the
current
frame.
D
The
temporal
motion
vector
is
the
global
motion,
data,
etc,
etc.
We
also
no
longer
need
a
syntax
to
reset
frame
context,
so
what
happens
is
on
a
key
frame?
Then
we
just
reset
all
the
reference
frames,
which
includes
resetting
the
probabilities
and
everything
else
on
an
intra
frame
on
an
intro
only
frame.
Then
then
we
reset
that
one
specific
reference
frame
that
the
Internet
reference
the
intra
frame
gets
stored
back
to,
but
not
any
of
the
others.
So
that's
an
internal
frame
that
is
not
a
key
frame
next
slide.
D
So
there
are
a
few
complexities
with
this.
So
there's
now,
as
Jonathan
winks
points
out
this,
this
interaction
between
the
reference
number
in
that
list
and
what
its
function
is
so
in
our
current
encoder
that
basically
the
first
reference
was
always
the
last
frame,
the
most
the
most
recent
frame
from
the
same
layer,
and
so
now
you
need
to
reorder
that
reference
list
to
use
probabilities
from
your
your
if
you
want
to.
D
D
So,
as
I
said,
there's
if
we
have
an
intro
only
frame
that
is
not
a
keyframe
there's,
there's
currently
no
way
to
use
a
previous
frame
context.
So
basically,
just
your
probabilities
always
get
reset.
That's
the
same
way.
Things
worked
in
vp9
and
we
didn't
change
that
because
you
think
it
was
that
useful
and
finally
previously
you
could
have
probabilities
from
a
non
reference
frame,
and
now
you
can't
just
because
there's
no
way
to
code
that.
D
But
since
we
now
can
list
up
to
six
of
our
eight
potential
references
as
references
for
the
current
frame,
you
know
the
impact
of
that
seems
kind
of
low.
It
seems
like
one
of
which,
whichever
frame
you
want
to
draw
probabilities
from
like
it's
probably
gonna,
have
some
useful
pixels
in
it
to
predict
from
and,
if
not
like.
You
know,
there's
some
other
frame
that
you
could
drop,
that
that
wouldn't
wouldn't
have
been
that
important
anyway,
with
a
list
of
six
and
eight
real.
D
So
so,
if
you're,
if
you're
decoding
a
not
if
a
non
reference
frame
yeah,
you
could
in
fact
update
a
context
and
then
and
then
use
that
in
some
future
frame
and
in
correct
now.
If,
if
nothing
ever
references
it,
then
you
can't
use
those
probabilities,
so
you
have
to
you
have
to
write
them
back
to
be
able
to
use
them.
D
One
is
this:
the
the
global
motion
thing
is
is
relatively
recent
and
so
we're
doing
that
as
part
of
the
global
motion
proposal,
which
is
not
complete
yet
so
that
hasn't
happened
and
while
we
were
working
on
this
people
started
doing
frame
size
prediction
based
on
the
previous
frame
as
well,
and
so
now
we
need
to
move
that
in
there
too,
but
but
the
main
proposal
was
still
put
everything
in
this
frame
context
inside
of
a
reference
frame
and
so
that
the
the
main
idea
doesn't
change.
So
it
can
handle
all
these.
D
These
new
things
that
people
are
adding
and
I
think
that's
everything
on
this
proposal.
Anyone
have
any
questions
about
any
of
that
works
all
right,
then
I
will
switch
gears
and
talk
about
a
completely
separate
topic,
so
I'm
also
not
Luke
Trudeau
or
David
Michael
Barr.
But
again
those
are
the
people
who
actually
did
all
the
work
that
I'm
about
to
talk
about
and
the
tool
I'm
going
to
talk
about
is
chroma
from
luma.
So
next
slide.
Basically,
we
have
been
have
changed
this
a
lot
from
previous
proposals.
D
The
stuff
I'm
going
to
talk
about
now
is
is
basically
an
evolution
of
the
stuff
we
presented
in
draft
egging
at
VCC
FL
over
a
year
ago.
We've
changed
essentially
everything,
and
this
is
complementary
to
the
proposal
in
draft
MIT's
Cogan
MVC
chroma
pred,
which
is
a
variant
of
CFL
used
for
inter
prediction.
So
what
I'm
going
to
talk
about
is
solely
used
for
intra
prediction
right
next
slide:
arms.
For
those
you
wondering
what
koma
formula
is.
The
idea
is
to
try
to
exploit
local
correlation
between
different
color
planes.
D
So
originally
we
had
designed
CFL
to
work
with
in
dala,
which
is
a
primarily
frequency
domain
based
codec,
so
in
dala,
chroma
from
luma
predicted
frequency
domain
coefficients
directly,
that's
hard
to
do
in
other
codecs
particular
in
a
v1.
For
example,
they're
up
to
16
different
transform
types
and
the
luma
transform
type
might
not
match
the
chroma
transform
type
and
the
luma
transform
size
may
not
match
the
chroma
transform
size,
and
you
know
we
had
a
way
of
that.
D
Could
that
last
one
could
sometimes
happen
in
dala,
but
since
everything
was
a
DCT,
we
sort
of
had
a
way
to
do,
mapping
from
one
to
the
other.
But
now,
if
you
have
to
expand
that
to
work
with
all
the
different
transform
type
combinations,
this
gets
really
complex
and
hard,
and
so
we
gave
up
and
said.
Maybe
we
should
just
do
things
in
the
spatial
domain.
D
So
when
it
works,
it
does
ok,
but
when
it
doesn't
work
it
can
be
really
really
bad,
and
so
we
said
instead
is
how
about.
Instead,
we
just
explicitly
signal
the
model.
When
we
did
this
in
dala,
we
actually
got
a
small
small
game
compared
to
trying
to
build
it
implicitly
so
that
we're
gonna
continue
to
you
alright
slide
so
sort
of
compare
this
against
things.
Other
people
have
done
LM
mode
is,
is
the
the
H
UVC
proposal
on
the
original
one
Thor
CFL
is?
Is
the
draft
mitts
go
there
and
I
talked
about
earlier?
D
Dolla
CFLs
are
previous
work
and
then
the
proposed
thing
over
there
on
the
right
is
what
we're
doing
now.
So
we've
moved
compared
to
dolla
CFL.
We
moved
back
from
the
frequency
domain
to
the
spatial
domain,
like
dolla
CFL.
We
are
now
doing
explicit.
Signaling
of
what
the
linear
model
is,
the
actual
signaling
is
a
little
bit
different
because
we're
no
longer
using
PV
Q,
which
you
people
remember,
is
our
perceptual
vector
quantization.
D
So
we've
basically
just
added
a
new
interpretation
mode
that
is
only
used
for
chroma
planes.
So
it's
it's
UV
specific,
and
so
that
signals
when
to
use
this
and
then
said,
we
no
longer
require
pv
q
because
we're
doing
everything
in
the
spatial
domain.
And
now,
when
we
on
the
encoder
side,
we
don't
do
an
explicit
model
fit.
We
actually
just
search
all
the
possibilities
that
that
we
want
to
encode
and
then
on
the
decoder
side.
D
So
we
averaged
blumen
pixels
over
the
whole
transform
block,
and
then
we
also
do
any
subsampling
that
we
need
to
do
to
convert
from
from
4
4
4
down
to
4
to
0
and
and
subtract
off
that
that
constant
offset
and
so
now
we're
left
with
with
basically
just
the
the
contribution
to
the
the
AC
coefficients,
but
still
in
the
spatial
domain,
and
we
feed
that
into
a
search
for
the
best
linear
parameters,
one
for
each
of
the
two
color
planes,
C,
B
and
C,
are
and
then
on
the
bottom.
They
are.
D
We
take
the
original
chroma
pick
tools
and
we
also
want
to
factor
the
the
DC
term
out
of
chroma.
But
since
the
decoder
doesn't
know
what
the
the
reconstructed
chroma
looks
like,
we
just
do:
DC
chroma
prediction:
that's
the
sort
of
the
best
guess
is
what
the
DC
will
be
and
we
subtract
that
out
and
feed
that
into
the
search
as
well,
and
then
the
search
searches
over
all
of
the
the
possible
choices
of
alpha
for
each
color
plane
and
explicitly
codes
that
up
to
the
bitstream.
D
So
there
are
a
couple
of
choices
here
that
we
made
for
efficiency
reasons.
So
when
we
have
a
prediction
block,
we
can
then
subdivide
that
into
multiple
smaller
transform
blocks.
So
when
we
do
our
luma
average,
we
do
it
over
just
a
transform
block
which
lets
us
do.
Reconstruction
transform
block
by
transform
block
and
basically
minimizes
the
amount
that
needs
to
be
buffered
and
hardware.
Are
things
like
that?
D
D
But
then
it
makes
your
search
really
hard,
because
every
time
you
every
time
you
pick
a
different
alpha,
you
have
to
do
a
full
transform
and
reconstruction
to
figure
out
what
the
the
DC
prediction
for
the
next
transform
block
would
be
to
figure
out
what
the
what
the
error
impact
of
choosing
your
alpha
would
be
for
that
transform
block
so
doing
the
the
DC
prediction
over
the
hole
prediction
block
at
once.
Avoids
that
whole
problem
all
right
next
slide,
then
the
decoder
side
is
again
pretty
simple.
B
D
So
so,
right
now
they're
jointly
coded.
So
basically
what
happens
is.
Is
we
code
an
angle
in
in
a
plane
of
alphas?
You
know,
so
you
have
a
two-dimensional
plane
of
alphas
for
CB
in
office
for
CR
and
we
code
a
direction
in
that
plane
and
then
a
magnitude
along
that
direction
so
to
the
extent
that
they
predict
each
other.
What
will
happen
is
is
is
probabilities
along
the
for
those
code.
Points
will
increase
right
to
the
extent
that
those
two
are
correlated
okay,.
D
D
And,
of
course,
there
are
complications,
so
the
first
one
is
for
sub
a
byte
block
sizes
for
four
to
zero
and
also
other
chroma
subsampling
formats.
So
what
happens
for
four
to
zero?
If
your
your
luma
blocks
are
smaller
than
8x8,
we
don't
want
to
have
transforms
that
are
smaller
than
8x8,
so
in
our
sub
sample
chromo,
we
use
one
4x4
transform,
which
then
covers
the
same
spatial
extent
as
multiple
luma
blocks.
D
So
as
a
result,
that
means
that
you
can
actually
have
some
of
the
blocks
in
this
sub
API
region
are,
inter
coded,
but
the
chroma
winds
up
being
intra-coded.
So
now
we
have
to
buffer
luma
from
the
inter
coded
pixels,
as
well
as
the
intra
code
of
pixels
in
in
the
sub
8x8
regions,
and
that
might
be
a
surprise
and
in
fact,
we've
implemented
this
incorrectly
at
the
moment,
but
we'll
fix
that
and
then
the
the
next
complication
is.
Is
doing
chroma
DC
prediction
for
non
square
blocks,
so
what
happens?
D
Is
that
that
these,
the
DC
prediction
works
by
basically
summing
up
all
the
pixels
to
left
and
summing
all
up
all
the
pixels
above
and
then
taking
an
average
and
when
your
blocks
aren't
square?
The
number
of
pixels
in
that
sum
is
not
a
power
of
two.
So
now
you
have
to
actually
do
a
division,
but
the
number
of
different
cases
there
is
is
pretty
small.
D
So
we
can
just
implement
that
division
with
the
lookup
table,
because
dividing
by
either
2
or
3
is
not
that
hard
and
av1
turns
out
to
be
adding
rectangular
transforms
with
rectangular
inter
prediction,
so
they're
gonna
have
to
solve
this
problem
anyway,
and
so
we'll
probably
wound
up
using
the
same
mechanism.
They
did
when
it
comes
time
for
that
slide
and
then,
finally,
there's
there's
all
sorts
of
fun
at
the
boundaries
of
the
frames.
D
B
D
Yeah
and
maybe
128
by
128
someday,
and
so
if
you
have
a
large
prediction
block
which
overlaps
this
boundary
but
smaller
transform
blocks
inside
that
large
prediction
block,
some
of
your
transform
blocks
may
be
entirely
outside
that
boundary
and
those
just
don't
get
coded,
which
okay?
That's
that's
fine
until
you
also
realize
that
your
chroma
transform
blocks
can
actually
cover
a
larger
area
than
the
corresponding
luma
transform
blocks.
This
might
be
easier
to
see
if
you
go
to
the
next
slide,
where
this
picture.
So
here's
an
example
of
when
this
happens.
D
If
I
have
a
32
by
32
prediction,
block
with
8
by
8
transforms
inside
of
it
in
the
luma
plane.
It
looks
like
this.
You
know,
as
it
runs
into
this
this
frame
boundary.
You
know
the
the
last
four
blocks
there
are
just
not
coded
like
they
don't
just
don't
appear
in
the
bitstream,
but
for
a
32
by
32
prediction.
Block
with
with
8
byte
transforms
in
the
luma
plane.
D
The
corresponding
transform
size
in
the
chroma
plane
for
4
to
0
is
also
8
by
8,
which
means
it
actually
covers
4
times
the
area
of
a
corresponding
luma
transform
block.
So
now
those
blocks
partially
overlap
that
boundary
and
since
they're
not
completely
outside,
they
still
get
coded,
which
now
means
I
have
a
bunch
of
chroma
pixels
where
there
is
no
corresponding
luma
pixel
to
draw
a
prediction
from,
and
so
currently
what
we're
doing
is
just
taking
the
last
row
of
luma
pixels
and
extending
them
downwards,
which
you
know
is
simple
and
seems
to
work.
D
D
It
has
a
small
effect
on
metrics
I
doubt
anyone
would
notice,
looking
at
the
images
and
and
ultimately
I
have
a
completely
different
set
of
proposals.
That
I
hope
will
clear
all
of
this
up,
but
I
have
no
idea
of
the
work
yet
for
simplifying
how
all
this
is
handled
and
not
just
for
CFL
but
for
the
whole
codec.
D
Like,
ultimately,
a
your
decoder
isn't
going
to
emit
those
things
outside
the
favourabie
anyway.
Isn't
it?
So
what
does
it
matter?
What
you
pick
the
chroma
to
be
yes,
exactly
and
and
the
the
answer
is
you
still
have
to
make
an
encoder
smart
enough
to
encode
something
for
them
that
doesn't
affect
other
things
right,
because
transform
coefficients
ring
across
the
whole
block.
D
So
I've
included
these
nice
outdated
example
images
these
are
about
a
month
old,
so
they're,
not
that
outdated,
but
but
things
have
changed
since
they
were
generated.
So
this
is
current
a
v1
and
the
next
slide
is
when
we
add
CFL
I,
don't
know
if
you
can
see
that.
Basically,
we
just
get
a
huge
amount
of
additional
detail.
D
Right
and
looking
at
the
the
objective
results,
so
these
are
measured
just
on
still
images,
since
this
is
an
intra
prediction
mode.
So
it
does
not
have
a
large
impact
on
inter
frames,
we're
basically
trading
off
about
0.3%
BT
rate
for
psnr
to
gain
5%
for
C,
acid2
mm
and
so
CID
CIE
D
2000
is
a
metric
which
actually
contains
both
luma
and
chroma,
approximately
weighted
perceptually.
D
So
it's
it's
doing,
CIE
lab
conversion
and
then
computing
Delta
e,
so
it
behaves
very
similar
to
psnr,
but
has
a
perceptual
wait
for
chroma
in
it,
and
so
you
know
the
you
usually
sort
of
expect
that
that
that
sort
of
small
small
changes
in
luma
can
generate
large
changes
in
chroma,
but
once
ad
takes
luma
into
account
in
in
you
know,
two
point:
three
four
five
seems
like
a
pretty
reasonable
trade-off,
but
we'd
like
to
shift
some
of
those
gains
back
into
luma,
so
we're
also
working
on
adjusting
the
the
luma
chroma
balance
in
the
encoder.
D
But
currently
none
of
that
is
is
at
all
sane
in
the
way
the
encoder
works.
It
actually
has
different
parts
of
the
encoder
using
completely
different
weights
and
so
we're
trying
to
sort
that
mess
out
and
then
maybe
we'll
have
a
parameter.
We
can
tune
to
move
some
of
those
gains
from
chroma
back
into
luma
and
have
green
numbers
across
the
board,
but
all
right
I
think
that's
it.
Are
there
any
questions.
G
D
B
D
So
I
think
that's
what
Steiner
was
suggesting
you
good
drive,
so
so
we
tried
it
in
dolla.
We
have
not
tried
it
in
81.
With
this
new
proposal.