►
From YouTube: IETF93-NETVC-20150722-1550
Description
NETVC meeting session at IETF93
2015/07/22 1550
B
A
A
Guess
we've
already
got
an
email,
sorry,
I'm
gonna
tell
them
about
it
to
ya.
No,
not
yet
we're
just
starting
now.
Okay,
it
sure
thank
you
so
so,
just
before
we
start
this
richer's
done.
Alice
Lee
be
taking
some
pictures
of
our
working
group
today.
So
if
anyone
has
any
objection
to
that,
let
us
know
I
can't
imagine
there
would
be.
A
C
C
C
We
her
huge
the
speakers
weren't
to
us.
Oh
that's
because
is
that
ok,
so
the
the
first?
The
first
presentation
is
going
to
include
some
IPR
details
at
the
beginning.
So
just
want
to
reiterate
that
we
are
while
it's
fine
to
to
talk
anything
about
factual
matters
on
IPR
we're
not
going
to
be
discussing
any
evaluations
of
IPR
or
any
opinions
on
the
relevance
of
any
IP
are
in
the
workgroup.
So
everyone
is
free
to
form
their
own
opinions
about
the
validity
of
IPR.
We
won't
be
discussing
in
worker.
A
A
A
Thank
you
very
much
Matt.
Oh
that's
true.
We
have
the
thing
going
on
all
right
cool.
This
is
the
agenda.
There's
been
one
slight
slight
tweak
to
this
in
as
much
as
we
had
20
minutes
at
the
end
and
Tim
decided
that
if
that
actually
stays
on
the
schedule
would
be
a
good
opportunity
to
talk
about
the
results
of
the
net
bc
hackathon
over
this
past
weekend.
So
you
cannot
interested
in
that.
E
So
see
if
I
can
get
a
mic
going
here,
so
I'm
going
to
talk
for
just
us
two
couple:
quick
slides
here
about
the
the
Thor
project
and
the
IPR
around
then
then
after
that
will
be
someone
speaking
about
the
actual
technical
details,
far
more
importantly
about
force.
So
next
slide,
please
so
the
most
important
thing
whoops,
those
are
not
the
slides.
E
E
Oh,
you
need
to
send
me
so
that
doesn't
really
matter
much.
It's
fine,
the
important
thing
for
me
to
say
the
slide
I
was
expecting
is.
We
have
made
an
IPR
declaration
on
some
of
this,
so
I
don't
get
any
of
the
you
know,
crazy,
Matt,
starry-eyed
looks
or
whatever,
even
though
read
the
IPR
declaration.
Of
course.
E
Obviously
we
intend
this
to
be
I,
go
towards
the
royalty
free
stuff
and
we'll
probably
updating
the
terms
on
there
towards
time,
because
we've
been
looking
at
things
like
the
opus
license
like
the
usual
Cisco,
don't
bug
us,
we
won't
bug
you
type
license
and
we
won't.
We
realized
through
the
whole
opus
codec
working
group
process.
It
was
valuable
to
have
all
the
people
who
are
contributing
have
a
similar
license.
So
we
won't
plan
to
work
with
people
to
update
the
license
a
little
bit
over
time,
so
yeah
there's
the
slide.
E
E
Yeah
next
slide,
I
guess
so
the
way
we've
been
looking
at
this
work
and
doing
it
is
we
have
a
technical
team
that
is
developing
that
the
technical
aspects
of
the
codec.
Obviously
this
is
a
team.
That's
worked,
you
know.
Maybe
those
people
worked
deeply
in
codecs
for
a
long
time
and
have
a
lot
of
IPR
experience
to
understand
the
landscape
but
they're,
taking
the
proposals
and
passing
those
over
to
the
legal
team
legal
team
consists.
E
It
is
trying
to
evaluate
this
because
we
don't
think
it's
really
possible
for
us
to
get
towards
a
royalty-free
codec
without
actually
understanding
some
of
these
things.
Legal
team
includes
some
external
and
internal
people.
People
with
strong
legal
backgrounds,
people
with
strong
video,
codec,
IPR
backgrounds,
worked
in
the
space
a
long
time
and
I'll
talk
a
little
bit
more
about
in
the
next
slide
about
how
they
deal
with
this,
but
they
pass
back
to
the
technical
team.
Both
issues
like
hey,
we
think,
there's
a
problem
here
in
this
type
of
area.
E
It
might
have
conflicts
with
this
or
that
and
also
can
pass
back
information
about
IPR
that
might
be
useful
to
solve
the
same
type
of
problem
that
they
found
when
searching
that's,
either
old
or
IP.
Either
they've
managed
to
license
in
under
ways
that
they
think,
but
you
know
be
acceptable
to
the
working
group
meet
the
sort
of
royalty
free
type
terms.
So
next
slide.
E
The
approach
that
we're
taking
to
the
IPR
evaluation
is
to
go.
I
gather
lots
of
different
patents
that
we
think
are
worth
reviewing
and
looking
at,
and
we
do
this
from
looking
at
a
combination
of
looking
existent
patent
pools.
Looking
at
companies
that
are
well
known
to
have
developed
IPR
in
this
type
of
space,
just
general
searches
for
forwards
on
the
move,
and
we
get.
E
We
gather
up
a
big
bin
of
stuff
and
from
that
we
need
to
sort
this
into
sort
of
tools
that
we're
using
and
applying
to
the
codex
under
evaluation
so
that
we
can
sort
of
bin
them
up.
We
figure
out
what
our
top
tools
are
and
then
we
evaluate
against
the
the
claims
that
we
found
against
given
types
of
tools.
Look
at
what
we
have
look
at,
how
it's
working
and
get
the
feedback
from
that
I.
E
See
is
thing
I've
missed
to
say
about
that.
We,
you
know
you
could
never
be
a
hundred
percent
done.
Gathering
we've
done
a
really
strong
game.
It
gathered
lots
of
stuff
pre-2008
less
post-2008.
Obviously
those
are
equally
relevant
still
in
the
gathering
phase
two,
but
we've
got
a
big
block.
We've
gone
through
a
bunch
of
the
tools
we
haven't
gone
through
all
the
tools
we've
wanted,
but
the
thing
that
we've
been
discovering,
as
we
start
evaluating
these
tools,
is
where
we
found
problems.
E
This
is
ongoing
process.
You
know
we
have
to
do
this
as
we
keep
iterating
the
codec
and
trying
it.
We
view
this
gone
over
a
long
period
of
time.
Certainly
the
actual
approach,
the
IPR
validation,
where
we're
happy
to
share
the
risk
in
that
and
do
that,
along
with
other
companies.
It's
not
something
we'd
expect
a
working
group
to
do,
but
if
other
people
want
to
do
that
under
the
the
right
type
of
agreements
that's
possible.
E
Certainly
we
don't
want
to
create
an
incentive
for
people
to
come
and
join
and
create
non
royalty-free
lifeguard
we
use.
So
that's,
basically
what
we're
doing
with
the
IPR
on
this
Thor
codec
that
we're
going
to
present
the
technical
stuff
in
a
bit.
There's
any
questions
on
that
I'm
glad
to
answer
them.
F
A
G
G
We
want
to
define
a
codec
that
has
moderate
complexity
and
can
run
in
real
time
software
and
hardware
and
of
course,
it's
also
possible
to
extend
it
to
non
real-time
purposes.
For
example,
2000
coding,
the
basic
building
blocks
are
well
known,
so
there
are
no
dramatic
changes
compared
to
h.264
and
aged
65
on
the
very
high
level.
It's
the
same
block
structure,
common
design,
elements
from
other
codecs
larger
block
sizes
transforms
quarter
pixel
interpolation.
We
do
you
have
some
royalty-free
cisco
IP
are
in
the
codec.
G
You
are
trying
to
avoid
non
royalty-free
IPR,
but
of
course,
if
other
companies
would
declare
their
our
PR
as
royalty
free
it
could.
This
could
help
improve
the
quality
and
design
of
this
codec
next
slide.
So
this
is
a
very
high-level
block,
diagram
or
the
encoder.
It
is
exactly
the
same
block
diagram
that
you
would
find
that
that
would
apply
to
age,
65
and
age
64.
G
So,
at
a
very
high
level,
the
block
diagrams
are
the
same
same
for
the
decoded
on
the
next
slide,
so
you
get
the
usual
stuff
transform
coding,
entropy
coding
in
turn
into
frame
prediction,
loop
filters
and
temporal
prediction
next
slide.
So
let's
go
into
the
details.
The
block
structure
starts
with
what
we
call
a
superb
look.
That
is
a
book
by
which
you
go
through
the
frame
in
a
raster
scan
order.
G
G
G
Next
slide,
inter
prediction:
luma
is
using
water,
pixel
resolution,
a
six-step,
separable
interpolation
filter,
except
in
the
center
position,
where
we
have
a
special
separable,
non-separable
low-pass
filter.
Some
of
you
might
remember
back
to
2000
to
where
gisli
proposed
special
filter
in
one
of
the
quarter
pixel
positions.
Some
people
call
that
the
funny
position-
and
this
is
not
fun
anymore,
but
it's
still
it's
a
special
position
and
a
special
low
pass
filter
and
q's
encoding
game
for
chroma.
It's
18
pixel
resolution
and
four
tap
separable
filter
and
we
do
support
multiple
reference
frames.
H
H
G
G
Next
slide.
Yeah
transforms
those
are
the
same
as
in
a
6
5
HDC,
except
that
they
added
a
64
x,
64
transform.
There
is
a
cisco
tar
on
the
transforms.
They
are
integer
approximations
to
the
DCT.
There
are
all
sizes
from
four
by
four
up
to
64
x
64
and
they
have
what
we
call
an
embedded
structure,
which
means
that
elements
or
the
four-by-four
transform
matrix
is
a
subset
of
the
elements
in
the
8.8
transform,
matrix
and
so
on.
G
G
G
I
I
G
G
There
is,
we
are
not
married
to
that
if
someone
want
to
contribute
a
good
arithmetic
coder.
That
would
be
interesting,
and
that
is
something
will
be
the
happy
to
consider.
But
at
the
moment
we
have
something
that
is
at
least
very
simple
low
complexity,
and
it
avoids
a
lot
of
idea,
but
we
will
be
happy
to
consider
arithmetic
coding,
also
as
a
consequence
of
the
VLC
based
approach.
Some
of
the
block
level
parameters
need
to
be
coded
to
jointly
to
get
close
to
the
entropy
for
the
transform
coefficient
coding.
G
This
is
an
improved
version
compared
to
what
we
had
in
version
born
of
the
hevc
reference
software
that
was
removed
from
the
software
eventually,
because
h.265
do
not
support
CF
Elsie,
but
we
have
improved
on
that
scheme
since
then.
So
this
is
what
we
use
for
transform
coefficient
coding
right
now,
next
slide
yeah.
This
is
on
encoder
optimizations.
This
is
a
non
non
non
normative
part
that
we
build
into
our
encoder
to
maximize
performance.
G
G
G
G
G
G
Next
myth
slide
yeah,
so
we
have
been
trying
to
compare
for
with
x265
and
vp9
using
the
reference
software
as
anchor,
so
three
codec
compared
to
HMS
anchor.
G
So
the
is
configured
with
what
we
call
low
delay
d
configuration,
which
is
no
reordering,
no
look
ahead
and
systematic
QT
variations,
so
the
gob
structure
is
fixed.
It's
independent,
adore
the
content
and
four
is
using
the
exact
same
constraints
for
vp9
iron
hand
x265.
That
was
a
bit
more
difficult
because
it
was
not
possible
to
configure
those
codecs
to
use
the
same
group
structure
as
so.
This
is
not
one
hundred
percent
apples
and
apples
comparison.
G
So
what
you
can
see
here
are
three
different
codecs
and
for
each
sequence
and
each
codec
there
is
a
number
that
is
a
bit
rate
number
and
it
tells
you
how
many
percent
extra
bit
to
use
compared
to
the
anchor
and
if
you
take
the
average
over
all
the
sequences
you
get
at
the
bottom
line,
and
what
you
can
see
is
that
for
uses
on
average
for
the
same
PSN
are
twenty-three
percent
additional
bits
compared
to
BP
nine
muses.
G
D
D
C
G
G
In
the
table
on
the
next
page,
cpus
cpu
used
is
zero.
Now
I
have
additional
results
where
I
change
the
complexity
setting
and
have
different
results,
but
these
were
the
high
complexity
operating
modes
for
all
the
Codex
okay
thinking,
but
to
go
back.
I
have
been
in
contact
with
people
in
google
to
discuss
these
settings
and
it
seems
like
they
have
a
preference
for
two
powers:
encoding
which
didn't
fit
the
low
delay
constraints.
So
maybe
that
explains
some
of
the
differences.
C
So
I
think
this
kind
of
highlights
the
need
for
in
the
testing
draft,
to
have
really
good
understanding
of
how
to
do
apples
to
apples
comparison
among
these
codecs
and
to
give
each
of
the
codec
teams.
You
know
the
onus
of
putting
the
best
settings
to
make
their
codecs
look
the
most
flattering
so
that
we
can
do.
Our
testing
can
be
fairly
sure
that
we
have
good
results
and
good
numbers
out
of
it.
D
G
G
G
Yeah.
So
I
should
explain
this
on
the
horizontal
axis.
It's
a
bit
rate
numbers
from
the
table
at
least
a
left
point
on
each
curve.
So
this
shows
number
of
bits
in
addition
to
the
anchor
on
the
vertical
axis,
it's
the
frame
rate
for
this
particular
sequence
during
single
core
encoding,
and
there
is
one
curve
for
each
codec
and
each
codec
has
multiple
operating
points
from
high
complexity
to
the
left
and
the
low
complexity
to
the
right.
G
And
as
you
go
along
the
curve,
you
increase
the
number
of
beats,
but
you
also
increase
the
speed
of
the
encoder
and,
of
course
the
goal
is
to
achieve
something
that
is
to
the
upper
left
corner
or
this
diagram
highest
possible
frame
rate
and
as
close
to
as
possible,
maybe
even
better.
This
is
where
we
are
today.
I
expect
that
we
will
improve
in
the
next
few
months,
certainly
on
the
bandwidth
axis,
but
maybe
even
more
on
the
vertical
axis,
because
they
only
just
started
this
work
to
optimize
for
speed.
G
F
Question
Shao
Qi
I
have
a
question
regarding
your
choice
of
using
b-frames,
but
in
the
low
delay
mode
person,
I
I
would
suppose.
Typically
people
would
try,
not
you
know,
sort
of
avoid
the
extra
delay
wispy
frames.
Can
you
explain
if
you
use
that?
What's
the
extra
delay
penalty
you
introduced
and
would
that
still
fitting
to
like?
What's
the
low
delay
delay
part
how
low
latency
can
support
yeah.
G
H
So
Tim
Theriault
from
Mozilla
armed
with
regards
to
the
the
entropy
coding
so
you're
perfectly
free
to
steal
the
arithmetic
coder
in
dolla.
It
should
actually
be
fairly
easy
to
convert
over
cuz.
We
support
raw
bits
which
are
basically
what
you're
doing
now
arm.
So
you
could
basically
just
find
your
put
pits
interface
and
replace
that
with
the
call
to
ours
and
then
convert
symbol
by
symbol
into
using
arithmetic
coding,
so
that
actually
would
probably
be
relatively
straightforward
to
do.
D
Stevebotts
go
from
polycom,
not
a
question
exactly
just
a
comment.
I
think
I
think
it's
great
that
we
have
another
candidate
I,
think
it's
a
looks
like
a
wonderful
piece
of
work.
I
like
the
fact
that
you're
vetting
the
IPR
so
carefully.
I
like
the
attention
to
medium
complexity.
I
think
this
will
be
very,
very
helpful
for
us
to
be
evaluating.
C
So
a
chair
point
on
the
multiple
candidates
I
think
we're
pretty
clear
that
the
output
of
the
working
group
is
going
to
be
a
single
codec
alright.
So
we
need
to
make
sure
that
we
understand
all
the
candidates
and
all
their
best
parts
and
figure
out
the
right
technical
solutions
to
arrive
at
a
single
net.
Vc
codec
at
the
at
the
end
of
the
work
groups
work
just.
C
D
A
B
Alright,
so
my
name
is
Nathan
eggy
I'm
from
Missoula
today,
I'll
be
talking
to
you
about
a
draft
that
we
submitted
around.
One
of
the
coding
tools
were
using
in
dolla
next
slide,
so
today,
I'll
be
talking
about
lap
transforms,
which
are
not
a
new
idea.
They
were
originally
proposed
for
still
image
coding
by
henrik
melv
are
in
1989.
The
idea
is
to
apply
a
pre-filter
across
block
boundaries.
B
That's
invertible
that
removes
spatial
correlation
between
the
blocks.
This
brief
filter
has
22
benefits.
The
idea
is
to
improve
coding
performance,
but
also
to
be
something
that
can
be
applied
on
the
decode
side.
To
remove
blocking
artifacts
I
mean
this
was
originally
used,
an
audio
and
the
idea
there
was
that
blocking
artifacts
end
up
being
something
that
are
very
audible,
and
so
they
needed
a
technique
like
this.
It
was
not
widely
adopted
in
video
because
of
some
of
the
problems
that
up
with
it.
B
Let's
go
to
the
next
slide.
Now
I'll
show
you
what
this
looks
like
so
pre-filter.
If
you
apply
it
to
just
an
image,
it
ends
up
d,
correlating
adjacent
blocks,
and
so
your
image
ends
up
being
blocky.
That's
what's
shown
at
the
top.
If
you
compare
what
happens
to
taking
a
original
image
and
applying
just
the
DCT
quantization,
you
can
see
you
get
these
blocking
artifacts
along
block
boundaries.
If
you
take
the
same
image,
apply
a
pre-filter,
the
DCT
in
the
same
quantization,
and
then
in
verse,
the
DCT
and
invert.
B
B
A
B
Somebody,
let
me
describe
what
how
we
use
these
in
dollar
on
the
next
slide.
Excuse
me
so
yeah.
The
pros
we
have
here
freezing
lab
transforms.
Are
you
have
a
larger
spatial
extent,
because
we're
doing
this
lap
transform
that
crosses
block
boundaries,
be
the
total
transform,
which
is
the
lab
transform?
Plus
the
DC
has
a
larger
support
area,
and
so
we
end
up
getting
an
improved
coding
gain
just
by
using
the
lab
transforms
and
we
didn't
experiment.
B
We
took
data
from
a
set
of
still
images,
it's
a
comparisons
where
we
were
using
the
KLT
for
four
by
four
blocks
everywhere
as
compared
to
the
DCT,
and
you
can
see
that
the
DC
gets
similar
performance
when
we
apply
lap
transforms
in
the
KLT.
We
get
the
same
kind
of
game.
So
what's
really
fascinating
about
this.
Is
that
enough?
On
the
4f
four
blocks
on
smaller
blocks,
the
benefit
is
almost
a
decibel
as
you
go
to
larger
blocks,
that
kind
of
falls
off
so
at
16
by
16.
B
B
Those
neighboring
pixels
are
no
longer
available
to
the
decoder
and
so
being
able
to
predict
a
block
from
its
spatial
surroundings,
doesn't
work,
and
so
in
da
we
had
to
come
up
with
other
techniques
for
that
that
do
not
have
the
same
benefit
directly
as
doing
spatial
prediction,
alright
next
line.
So
what
we
have
some
support
for
in
dollar
currently
is.
We
have
support
for
the
following
block
sizes,
four
by
four
up
to
64,
x
64,
which
is
which
is
in
progress.
B
Yes,
so
on
a
little
thing
we
we
can
use
for
by
44.8
point
filters.
We
apply
bait
point
filters
across
larger
blocks
and
four
point
filters
across
the
four
by
four
blocks
when
we
split
an
eight
by
eight
block
down
to
a
format,
for
we
then
apply
a
four-point
filter
on
the
interior,
edges
and
I'll
show
a
demonstration
of
that
shortly
for
cuma
for
chroma
in
444.
B
We
do
exactly
the
same
thing
as
you
do
in
luma
when
it's
420
it'll
use
a
four-point
filter
everywhere,
and
this
is
so
that
the
filters
along
block
boundaries
have
the
same
spatial
extent,
because
we
do
things
like
comer
from
luma
and
other
prediction.
And
then
the
important
thing
knows
that
the
lapping
sighs
does
not
depend
on
the
the
neighbors
block
size.
So,
as
you
recurse
through
your
your
block
size
decision,
the
deburring
blocks.
Changing
your
new
burrs
split
decision
doesn't
impact
the
lapping
along
edges,
and
this
allows
us
have
been
extensively.
B
So
this
is
the
order.
We
apply
filters,
starting
a
super
block
level
and
in
dala
we
have
32
by
32.
Superblocks
currently
will
be
moving
to
64
x,
64
superblocks
you'll
apply
the
8-point
filter
across
the
top
and
bottom
edges
and
next
slide.
You'll
then
apply
it
across
the
left
and
right
edges
and
then
X
on
then
you'll
apply
it
across
the
horizontal
edge
and
in
the
vertical
edge,
and
now
you
can
recurse
and
do
this
the
same
technique
for
all
the
interior
edges
of
your
blocks
below
that.
B
B
That
is
that
for
every
any
input
x,
applying
the
forward
lap
transform
followed
by
the
inverse
lap
transform,
gives
us
the
same
value
of
x,
and
this
is
important
because
we
want
to
make
sure
that
when
you
do,
you
have
similar
content
frame
by
frame,
you
would
like
to
have
the
same
values
come
out
of
your
inverse
transform
so
that
there's
no
rounding
error.
That's
accumulated
auction
items
are
by
orthogonal,
which
means
that
the
lab
transforms
infer
to
introduce
some
scale
factors.
B
These
scale
factors
have
annual,
have
some
correlation
between
the
coefficients,
we're
not
exploring
that
correlation.
There's
a
dynamic
range
expansion
using
his
lab
transforms
to
the
core
DCT
we
designed
as
an
orthonormal
transform,
which
means
there's
no
there's
a
minimal
range
expansion
on
that
for
the
pre
and
post
filters.
They
add
about
one
or
two
bits,
depending
on
the
scale
factors,
because
the
the
DCT
is
done
with
an
integer
approximation
for
very
small
inputs,
there's
a
lot
of
rounding.
B
So
we
get
around
this
by
scaling
all
of
our
inputs
up
by
four
bits
and
then,
after
the
transform
me
on
the
inverse
side,
once
we've
done,
the
inverse
transform
will
scale
them
back
down.
This
has
the
effect
that
four
blocks
larger
than
sixteen
by
sixteen.
We
can
no
longer
fit
coefficients
in
16
bits
and
they'll
have
an
impact
on
simdi.
Perhaps
when
we
get
to
doing
that,
optimization
next
slide
and
that's
it.
C
B
So
I
think
there
are
two
things
one
is
is
you
know
in
the
audio
feel
this
was.
This
is
used
extensively
everywhere
because
of
the
reduced
reduction
in
blocking
artifacts,
and
so
we
already
kind
of
had
an
idea
that
this
might
be
something
that's
interesting.
We
thought
that
the
coding
gain
might
be
a
bigger
deal,
but
it
turns
out
some
of
the
larger
blocks.
The
improvement
by
using
the
web
transform
was
was
reduced
and,
in
particular,
moving
to
these
4
by
4,
or
these
4.8
point.
B
C
B
I
My
name
Mozilla
actually
I
in
some
sense
the
the
coning
gains
reported
here
or
kind
of
misleading
in
the
sense
that,
especially
for
large
blocks
it
in
practice,
it
is
actually
a
place
where
lapping
benefits
the
most
because
of
its
reduced
blog
because
of
the
reduced
blocking
artifacts.
For
example,
the
8-point
lapping
is
the
wider
than
typical
adaptive
loop
filters,
so
it
creates
less
blocking
our
large
on
larger
blocks,
despite
the
fact
that
the
coding
gains
that
are
measured
that
are
like
special
theoretical
measurements,
they
do
not
see
that
so
and
even
on
smaller
blocks.
I
C
Mosin
had
again
for
Mike,
so
what
what
Jean
machi
just
said
struck
me
as
it
may
have
better
sigh
performance.
Then
then,
then,
some
objective
metrics
currently
show
does
that
highlight
the
need
for
trying
to
find
metrics
to
capture
the
you
know
the
the
artifacts
like
blockiness,
better,
the
testing
draft
I.
I
That
jonathan
am
not
aware
among
the
metrics
we
currently
use
for
dello
on.
Are
we
compressed
yet
one
of
them
is
called
fastest
SI
m
and
it
is
very
sensitive
to
blocking
artifacts.
It
has
its
own
issues.
It
is
by
no
means
perfect,
but
I
believe
having
many
metrics
is
a
good
thing,
and
at
least
they
can
hope
it
when
one
behaves
very
differently
from
the
others.
It's
at
least
a
sign
that
one
should
bake
very
close
attention
to.
I
H
Yeah
so
Tim
Terry,
Brazil
again
just
responding
to
mo
arm
so
Google
has
a
metric
that
they
designed
to
detect
blockiness
and
I'm
sure
there's
plenty
of
other
ones
around
there,
but
that
particular
one
is
open
source
and
we've.
You
know,
looked
at
integrating
that
into
our.
Are
we
compressed
yet
tool?
It
just
hasn't
happened
yet,
but
we
plan
to
do
so.
I
The
draft
that
I
submitted
is
implemented
in
the
context
of
the
dollar
codec,
but
it
would
actually
be
applicable
to
pretty
much
any
other
codec,
because
it's
a
completely
separate
coding
technique
next
slide.
So
there's
several
properties
that
screencasting
content
has
compared
to
normal
photographic
type
video.
I
This
is
a
possibly
non-exhaustive
list,
so
when
the
first
property
is
the
only
one
that
I
really
addressed
in
this
draft
and
it
is
being
able
to
properly
encode
anti-alias
text
without
making
too
much
of
less
out
of
it
and
I'm
talking
about
encoding
in
the
pixel
domain,
I'm,
not
addressing
any
sort
of
vector
side
channel,
so
there's
others,
there's
other
special
properties.
For
example,
the
content
tends
to
have
many
horizontal
and
vertical
edges
like
window
borders
and
things
like
that.
I
It
tends
to
have
a
reduced
number
of
colors
and
at
least
in
many
blocks
it
does.
There
was
also
in
terms
of
motion.
It
tends
to
be
rectangular
because
people
move
windows
around,
so
that
is
also
very
common.
I
did
not
address
any
of
these,
except
for
the
first
one,
if
you
can
think
of
others
that
we
should
consider,
then
I'd
like
to
know
yeah.
D
Jonathan
Lennox,
the
other
I,
think
the
temporal
qualities
of
screencasting
are
also
very
interesting
because
you
tend
to
have
nothing
happening
for
a
very
long
time,
but
all
of
a
sudden
everything
changes
at
once.
Basically
right
know
if
you
hit,
if
if
he
hit
the
next
slide
boom
ever
you
know
other
than
the
white
background,
all
the
pixels
just
changed,
but
before
that
it
was
completely
static
for
15
seconds
or
a
minute
or
something
so
I
think
that's
the
other
interesting
difference
with
screencasting.
Yes,.
I
C
Mosinee
I
think
again
from
a
for
Mike
I.
Think
up
when
people
say
screencasting,
it's
a
usually
a
very
varied
perspective
of
what
the
word
means.
Maybe
we
should
highlight
this
more
clearly
in
the
requirement
stress,
but
this
could
mean
anything
from
what
we
think
of
presentation
casting
to
you
know
wireless
display
and,
of
course,
in
wireless
display.
All
bets
are
off
of
what
the
content
actually
is
it
most
likely
is
not
just
you
know,
static
people
don't
present
to
themselves.
C
I
Well,
one
thing
implicit
that
I
should
have
mentioned
also
is
that
all
of
this
should
probably
be
on
a
like
switchable
inside
a
particular
frame
like
obviously,
if
you
do
remote
desktop,
you
might
have
a
with
a
movie
playing
somewhere,
and
you
want
to
be
able
to
code
both
in
the
same
frame.
So
I
think
this
probably
addresses.
I
Next
slide,
please,
ok!
So
the
approach
that
I'm
presenting
here
is
based
on
the
haar
wavelet.
It
is
the
very
simplest
wavelet
one
could
possibly
use
and
is.
It
is
absolutely
terrible
for
use
in
any
sort
of
natural
images.
However,
it
has
interesting
properties
for
the
very
specific
case
of
screencasting.
I
So,
very
quickly,
the
on
the
left,
you
can
see
the
actual
mathematical
definition
of
the
heart
transform
for
a
one-dimensional,
true,
two-point,
transform
on
the
right.
This
is
how
the
decomposition
works
in
the
example
where
you
have
a
2d
transform
of
four
by
four
points,
so
you
have
only
a
single
DC
over
that
block
and
you
have
very
localized
very
localized
basis
functions,
especially
for
the
high
frequencies,
and
this
is
the
idea
that
for
text
you
want
to
reduce
ringing.
So
you
can
do
it
that
way.
I
So
once
we
have
the
actual
transform,
we
need
to
encode
the
quantized
coefficients
and
in
terms
of
quantizing.
This
is
done
in
a
with
what
I
call
the
l1
tree
wavelet
encoding.
This
is
based
on
the
tree
structure
of
the
wavelet
transforms
kind
of
similar
to
other
taking
tree
based
techniques
like
the
easy
W,
the
embedded
03
wavelet.
I
In
this
case,
the
main
difference
is
that
the
tree
is
based
on
the
sum
of
the
absolute
value
of
the
entire
three.
So
the
very
first
thing
we
encode
is
the
sum
of
the
absolute
value
of
the
coefficients
for
the
entire
block,
and
then
we
say
this
sum
is
just
distributed
between
horizontal
diagonal
and
vertical
in
this
way,
knowing
what
the
sum
is
and
then
for
each
direction,
we
start
with
the
top-level
tree,
and
we
recurse
down
saying
this
is
how
the
sum
is
split
between
the
parent
coefficient
and
all
of
the
four
children
coefficients.
I
I
Next
slide,
so
I
hope
this
kind
of
shows
up
very
relatively
clearly.
This
is
really
a
magnified
image
of
the
of
it
screenshot
that
I
took
the
fact
that
it's
my
magnified,
may
change
some
artifacts,
but
it
should
give
an
idea
so
I'm
going
to
show
for
images
at
exactly
the
same
right.
This
is
a
crop
of
the
much
larger
image
that
was
encoded
at
the
same
size
for
all
the
different
codecs.
So
this
is
what
we
get
right
now
with
JPEG
next.
I
This
is
what
we
get
right
right
now
or
a
few
weeks
ago,
with
the
dolla
lap
transform
base
encoder.
So
it's
not
really
great
in
terms
of
ringing.
Lapping
is
really
good
for
many
things,
but
for
text
it
is
terrible
next
slide.
So
this
is
what
we
can
get
with
the
with
the
simple
har
scheme
that
I
just
presented.
C
I
So
this
is
what
we
got
with.
Harwich
has
much
less
ringing
around
the
text,
and
you
can
compare
with.
This
is
x265
which,
on
text,
only
images
actually
performed
slightly
worse
than
her.
However,
in
this
case,
it's
better
because
it
handles
all
of
them
all
of
the
long
lines,
much
better
than
the
hard
transform,
which
is
very
localized
like
it,
especially
all
the
icons.
Look
a
lot
better
in
265,
but
at
least
the
text
with
the
hard
transform,
looks
pretty
good
next
slide.
I
So
in
terms
of
objective
evaluation,
there
is
currently
a
screenshots
that
set
in
our
we
compressed
yet
composed
of
about
a
dozen
screenshot,
or
so
it
is
very
preliminary.
It
can
probably
be
improved
a
lot
if
some
of
you
have
a
better
a
better
test
set
that
we
can
use
it.
That
would
be
very
appreciated.
What
we
have
is
just
random
things.
I
We
got
from
Wikipedia
that
the
screenshot
enough
and
weren't
compressed
before
in
terms
of
metrics
on
our
we
compress
way
that
we
have
PSN
rhps
PSN,
are
ssim
and
fastest,
as
I
am,
and
at
this
point
it
is
not
clear
whether
any
of
these
are
it.
Whether
any
of
these
is
any
good
on
screen
shots
so
far,
it
appears
that
PSN
rhps
is
the
least
wrong,
but
I
would
not
trust
these
metrics
very
much
at
this
time,
so
I
prefer
looking
at
it
so
far,.
F
So
to
comments,
one
is
regarding
the
visual
comparison
you
show.
I
would
assume
that
people
might
have
preferred
x264
versus
the
heart.
As
you
explained,
even
though
you
have
text,
you
know
sort
of
screen
sharing
with
some
text,
but
apparently
x265
sorry
handles
the
mixture
of
content
at
very
well.
Even
when
you're
just
you
know,
sharing
a
document
yeah.
I
So
right
now
the
the
the
the
the
comparisons
that
I'm
showing
here
is
based
on
applying
the
hard
transform
everywhere
in
the
image,
which
is
actually
not
that
good.
In
many
cases,
for
example,
there
there
are
places
where
there's
gradients
and
things
like
that
which
are
really
not
well
handled
by
har,
and
so
that
overall
x265
actually
does
better
than
when
I'm
presenting
here.
I
How
the
only
of
the
test
set,
we
haven't
already
compressed
yeah
the
only
image
where
the
hard
transform
actually
performed
better
than
265
was
the
old
is
the
only
one
that
has
only
a
lot
of
text
and
nothing
else,
and
on
that
one
we
actually
perform
better
on
the
image
here.
What
happens
is
that
the
the
hard
code
needs
to
spend
a
lot
more
bits
on
the
the
icons
and
things
like
that,
so
it
has
to
spend
fewer
bits
into
65
on
the
text
and
it
gets
it's
slightly
worse
and
again.
F
I
Absolutely
like
we,
this
is
an
absolute
requirement,
because
you
cannot
assume
that
the
entire
frame
will
be
made
of
text
some
will.
Some
parts
will
be
icons
and
lines
and
some
part
will
be
just
like
natural
content
and
we
need
to
be
able
to
handle
all
of
this,
which
is
not
at
all
implemented
at
this
point,.
F
Set
in
common
is
regarding
the
metrics
that
you've
been
trying
so
far,
based
on
my
understanding.
They
were
really
designed
to
correctly
I
mean
Pearson.
R
is
a
purely
you
know,
statistical
one.
So,
aside
from
that,
all
the
others,
the
SSI
ms,
are
really
designed
to
reflect
statistics
in
net
from
natural
images.
So
that's
probably
why,
from
the
very
start,
they
are
not
good
candidates
to
consider.
It
would
be
nice
to
consider
things
which
match
the
visual
characteristics
of
screen,
sharing
things
like
that
reduce
color,
space,
straight
lines,
etc.
Yeah.
C
Imelda
national
for
Mike,
so
I
was
surprised
that
you
were
targeting
anti-alias
text
with
hard,
because
I
actually
would've
thought
that
that
you
know
just
straight
lines
or
you
know,
non
anti-aliased
text
would
actually
be
where
our
outperforms
and
things
you
know
things
like
264
encoding,
the
more
anti-aliasing
you
do.
Typically,
the
better
man
coding
ends
up
looking
have
you
tried
this
on
something
like
an
Excel
spreadsheet,
with
lots
of
sharp
in
horizontal
vertical
lines
and
maybe
not
non
anti-alias
text
yeah.
So
actually.
I
Things
like
spreadsheet
is
one
of
the
worst
issues
we
have
right
now
with
the
heart
transform
and
if
you
go
back
to
the
the
slide,
where
I'm,
showing
the
basis
functions,
I
believe
this
is
the
third
slide
or
something
just
yeah
here.
So
what
happens?
Is
our
heart
or
high
frequency
basis?
Functions
are
very
narrow,
so
if
you
have
a
line
that
spends
the
entire
block,
we
have
an
entire
line
of
nonzero
coefficients,
whereas
in
the
DCT
you
can
represent
this
more
compactly.
So
this
this
is
one
area
of
improvement.
I
Right
now
that
I'm
looking
at
like
how
how
to
extend
the
trip
either
extend
the
transform
use
a
different
decomposition
to
be
able
to
have
more
compact
representation
for
purely
horizontal
lines.
This
would
very
much
increase
like
the
essentially
the
place
where
x265
does
the
best
compared
to
when
I'm
presenting
here
is
actually
spreadsheets,
because
the
there's
lots
of
horizontal
and
vertical
lines
for
which
the
dct
is
not
so
bad
at
representing
and
hard
absolutely
terrible.
So
this
is
a
known
place
that
can
actually
be
improved
in
what
I'm
in
what
I'm,
presenting
and.
C
H
All
right,
thank
you.
So
this
weekend
we
had
a
hackathon
well
for
a
large
number
of
working
groups,
but
one
of
them
was
net
VC
and
there
we
ran
a
bunch
of
experiments
on
both
Thor
and
dalla
and
I
thought
that
would
be
of
interest
to
this
group.
So
first
thing
we
tried
to
do
was
integrate
Thor
into
our
we
compressed
yet,
which
is
the
website
that
we
use
to
to
test
dolla,
so
we
had
to
disable
be
frame,
support
which
seems
to
be
okay,
since
it's
not
very
well
tuned
anyway.
H
But
the
reason
for
that
is
that
the
current
implementation
requires
the
frame
count
of
the
video
to
be
a
multiple
of
the
GOP
size,
which
is
currently
12
frames,
and
not
all
of
our
videos
actually
met
that
requirement,
so
that
would
have
screwed
up
our
numbers,
but
we'll
we'll
get
that
resolved
eventually.
So
next
slide,
so
here's
comparing
the
two
codecs
via
PS
and
are
so
the
the
muddy
yellow
or
whatever
they're
at
the
top
is
Thor
and
the
blue
line,
underneath
it
is,
is
dolla.
H
So
it's
currently
showing
that
that
Thor
has
a
43.5
percent
rate
advantage
over
dolla
on
PSN
are,
which
is
not
too
surprising.
Since
we've
we
have
not
optimized
for
PSN
are
in
fact
intentionally
done.
Lots
of
things
to
to
make
PSN
are
bad,
so
two
of
those
things
are,
we
use
quantization,
matrices
and
activity
masking
and
those
are
relatively
easy
to
shut
off.
So
we
did
next
slide
and
that
may
reduce
the
gap
to
about
twenty
three
point.
Six
percent,
which
you
know,
is
a
sizable
amount,
but
by
no
means
all
of
it.
H
So
according
to
PSN
are
you
know.
Thor
is
doing
much
better,
so
next
slide
on
PSN
rh
vs.
The
results
are
a
bit
more
mixed,
so
the
the
bdr
difference
is
less
than
one
percent,
but
you
can
see
that
that
Thor
is
is
doing
better
at
the
low
rates
and
we
are
doing
better
at
high
rates,
in
particular
we're
doing
better
at
high
rates,
they're,
probably
so
high
that
they're
not
actually
practical,
but.
H
But
again
is
more
mixed,
then
next
slide.
If
we
look
at
fast,
ms
fastest
as
I
am
instead,
then
the
story
is
the
complete
opposite
that
it
says:
Dollaz,
ninety-one
percent,
better
than
Thor,
basically
across
all
rates
and
so
I
don't
know
what
any
of
this
means.
This
will
probably
involve
actually
spending
some
time,
looking
at
images
and
videos,
as
opposed
to
staring
at
curves,
to
figure
out
who's,
doing
better
in
what
scenarios
and
what
conditions
etc.
H
But
the
good
news
is
is
that
the
tube
contributions
do
appear
to
perform
very
differently,
so
you
know
we
may
be
able
to
take
the
best
of
both
and
wind
up
in
a
much
better
place.
So
next
slide.
H
So
then
we
wanted
to
start
to
understand
a
little
bit
more
detail
like
what
is
what
is
responsible
for
the
differences
in
performances,
and
so
one
experiment
we
tried
was
to
basically
take
dalla
and
rip
out
all
of
dollars,
motion
compensation
and
replace
it
with
Thor
arm,
though,
motion
compensation
in
dollars
relatively
decoupled,
so
it
was
actually
a
fairly
easy
experiment
to
run.
We
ran
four
different
variations
of
this,
which
I'll
describe
in
the
next
few
slides.
H
So
if
you
call
the
block
diagram
from
abdallah
from
monday's
session,
we
basically
did
it
took
the
OB
MCC
block
right
there
next
slide
and
replace
that
with,
or
so
everything
else
is
still
running
and
he's
still
using
dalla.
So
Thor
forms
the
prediction
frame
and
we
run
lap
transforms
on
that
prediction.
Lap
transforms
on
the
inputs
and
both
of
those
two
pv
q
and
use
all
our
quantization
and
entropy
coding,
and
all
that
so
again,
this
is
because
this
is
running
a
dolla
doesn't
do
any
multiple
references.
H
So
for
the
first
experiment
in
Thor
we
disabled
residual
coding
because
we're
trying
to
use
this
to
make
a
prediction
for
dolla,
we
disabled
all
of
the
intra
modes,
because
dalla
does
not
have
inter
modes
in
its
motion
compensation
and
we
disabled,
64
x,
64
blocks
because
dollas
motion
compensation
only
goes
up
to
32
by
32
arm,
and
you
can
see
from
that.
That
dolla
is
about
24
to
28
percent,
better
with
a
basically
unmodified
Thor,
except
for
just
shutting
these
things
off.
But
this
is
not
a
really
a
fair
comparison.
H
So,
in
this
case,
the
the
yellow
line
there
is
the
Thor
one
which
is
underneath
the
blue.
One
sodala
is
better,
but
this
is,
as
I
said,
this
is
not
really
a
fair
comparison
arm,
because
Thor's
is
spending
a
bunch
of
bits.
You
know
in
order
to
encode
the
possibility
that
it
might
use
some
of
all
these
things.
You
know
some
of
these
modes
and
things
that
we've
all
disabled,
so
the
next
thing
I
did
was
go
around
and
basically
you
know
disable
coding
bits
for
those
things.
H
H
So
is
now
actually
much
closer
within
ten
percent
of
where
dollars,
and
that
was
it
was
the
second
experiment
we
ran,
and
so
typically
you
would
expect
Oh
BMC
to
be
doing
with
about
one
decibel
better,
which
on
the
first
slide,
was
roughly
where
we
were,
but
when
turns
out,
when
you
disable
all
these
bits,
it's
actually
much
closer
than
you
might
expect.
So
maybe
room
for
improvement
for
us
there.
So
next
slide.
So
the
third
experiment
we
ran
was
to
re-enable
the
intra
modes
and
add
them
back
to
the
VLC.
H
Solutely
actually
had
the
cost
for
them,
and
I
should
point
out
by
the
way
that,
for
all
these
experiments,
I
didn't
actually
write
a
decoder.
This
is
all
changing
the
encoder.
Only
so
I
could
very
easily
have
screwed
something
up
in
all
this
stuff,
but
but
we
re
enabled
these
intra
modes
arm
and
the
numbers
here
at
the
top
are
comparing
the
previous
Thor
experiment
to
the
current
Thor
experiments.
H
This
is
not
Thor
versus
all
anymore,
but
it
says
just
turning
on
those
inter
modes
so
that
we
have
some
kind
of
intra
prediction
in
our
motion
compensation
for
places
where
the
motion
estimation
is
not
going
to
do
a
good
job,
you
know
makes
less
than
a
two
percent
difference,
which
was
somewhat
reassuring.
Since
you
know
the
fact
that
we
don't
have
this
in
Dawa
has
has,
you
know
been
something
that
we
thought
might
be
a
severe
performance
limitation
for
a
long
time
arm.
H
This
is
the
actual
performance
limitation
in
dollars,
probably
a
bit
higher
than
this,
but
you
know
for
various
different
reasons,
but
you
know
this
at
least
gives
you
some
kind
of
ball
mark,
and
it
was
a
15-minute
experiment.
So
no
it's
nice
to
have
a
nice.
You
know
small
and
simple
code
based
like
Thor
to
run.
Some
of
these
on
is
doing.
This
experiment
in
dalla
would
have
been
much
more
difficult,
so
next
slide.
H
So
the
final
experiment
was
basically
the
the
cheating
experiment,
so
this
this
turns
on
all
the
things
in
Thor
that
that
would
make
sense
to
turn
on
so
specifically
64,
x,
64
blocks
and
also
coding.
The
splits
for
those
again-
and
the
surprising
thing
to
me
at
least,
was
that
this
still
makes
you
know
between
a
seven
and
twelve
percent
difference,
depending
on
which
metric
you
look
at,
and
all
that
mostly
shows
up
at
low
rates
on
these.
H
The
slightly
curious
thing
is
that
at
very
high
rates
you
can
actually
see
that
that
made
things
worse
for
Thor.
So
again,
these
are
numbers
comparing
the
previous
Thor
experiment
to
the
to
this
door
experiment
arm.
But
if
you
go
to
the
next
slide
that
actually
puts
Thor
ahead
of
dalla
when
you
turn
all
of
these
things
on
so
that
suggests,
there's
some
room
to
improve
dalla,
thereby
adding
64
x
64
blocks,
and
hopefully
we
can
get
similar
improvements
out
of
it
all
right
next
slide.
H
H
So
what
we
did
at
the
hackathon
was
just
a
very
simple
hack
where
we
have
no
signaling,
so
we
decide
on
a
super
block
by
superblock
basis,
whether
to
enable
the
filter,
but
we
didn't
actually
code
anything
just
to
say
whether
or
not
we
were
doing
that
we
have
better
patches
now
that
are
showing
real
gains,
but
I
don't
have
them
on
these
slides
arm.
H
But
the
nice
thing
about
this
is
that
this
was
actually
solvent,
solves
a
long-standing
quilting
artifact
that
we
had.
We
had
observed
at
in
fades
at
low
rates,
which
is
actually
pointed
out
to
us
by
thomas
davies,
so
next
slide
good
and
it
actually
shows
up
on
the
screen.
So
if
you,
if
you
have
a
fade
to
or
from
black,
at
very
low
rates,
we
would
get
what
you
can
see
is
these
quilting
artifacts,
so
they
actually
show
up.
H
H
So
so
that
looks
very,
not
wonderful,
and
then
we
had
done
some
analysis
and
we
understood
why
this
was
happening,
which
is
complicated
and
I
can
go
into
it
in
detail
if
you
want,
but
you
probably
don't
want
me
to,
but
the
best
solution
that
we
had
had
come
up
with
previous
to
this
was
to
arm
switch
to
12
bit
reference
frames,
so
so
I
believe,
as
we
is
Nathan
mentioned,
we
scale
up
all
of
our
are
pixels
by
by
16
before
running
them
through
our
transforms,
and
then
we
scale
them
back
down
to
8
bits
before
storing
them
in
our
reference
frames.
H
Well,
if
you
don't
do
this,
that
scaling
back
down
this
artifact
goes
away,
but
that
also
doubles
the
amount
of
memory
that
you
need
for
your
reference
frames
and
increases
your
memory
bandwidth
and
has
lots
of
other
bad
effects.
So
we
thought
maybe
this
constrained
low-pass
filter
would
would
be
able
to
solve
the
problem
without
that
expense.
And
indeed,
if
you
go
to
the
next
slide,
it
makes
this
basically
go
away
arm.
So
that
was
a
nice
positive
development
and
those
were
the
results
from
our
hackathon.
So
are
there
any
questions.