►
From YouTube: IETF98-CODEC-20170330-0900
Description
CODEC meeting session at IETF98
2017/03/30 0900
D
A
D
A
D
A
D
C
Dot
patch
I,
don't
think,
is
a
viable
format
to
submit
slides
in
you
upload
a
dot
patch
proceedings.
Yes,
we
did.
A
A
A
We
have
two
one
is
the
draft
that
you're
talking
about
today,
which
was
supposed
to
be
done
in
November,
so
I
guess
we'll
have
to
update
that
date
and
the
other
one
was
the
bug
fixed
draft,
which
has
past
working
group
last
call
and
was
waiting
on
us
to
figure
out
how
to
do
the
the
get
a
permanent
URL
for
the
new
test,
vectors,
which
I
think
has
now
been
sorted,
and
so
now
it's
just
waiting
on
me
to
do
the
write
up
and
hand
it
off
to
them,
so
that
should
open,
hopefully
happen
pretty
soon.
F
A
G
G
Okay,
so
the
agenda
is,
is
fairly
straightforward.
We're
gonna
start
with
a
foundation
on
ambisonics
what
what
that
and
what
that
format
is
and
how
it
relates
to
spatial,
audio
and
3d
audio
and
then
we'll
discuss
what
our
spec
proposition
is
and
involving
adding
ambisonics
to
opus,
including
the
new
proposed
mappings,
as
well
as
what
kinds
of
calculations
would
be
involved
in
those
mappings.
G
G
We
hear
it
as
a
composite
signal
and
we
interpret
it
as
oh
it's
over
here
over
there.
This
is
how
basically
HR
TFS
had
related
transfer
functions,
work
where
you
have
a
filter
that
describes
that
distance,
that
path
distance
for
each
ear,
for
a
given
position
and
what
ambisonics
allows
us
to
do
is
it
allows
us
to
not
just
model
one
point
at
a
time,
but
it
allows
us
to
model
the
entire
sound
field
around
the
head.
So
this
has
a
lot
of
advantages
in
video
games
in
VR
audio
in
360,
video.
G
G
Well,
if
we
pretend
that
this
blue
sphere
is
a
not
an
omnidirectional
microphone,
but
a
spherical
microphone
that
it
is,
it
can
capture
it's
a
it's
some
sort
of
spherical
capsule
and
it
can
capture
sound
across
the
whole
position
of
it
and
if
I
were
to
place
right
there
and
there
and
let
them
charge
out
and
have
them
capture
onto
the
sphere,
you
see
that
they
would
capture
it
to
different
points
in
the
microphone.
This
is
like
at
a
particular
time
snapshot
right,
and
this
is
telling
you
maybe
the
energy
at
that
particular
type
shot.
G
Then
what
we
have
is
we
have
some
representation
of
two
signals
arriving
at
two
different
locations
on
sphere,
so
we
this
this
would
might
be
our.
This
is
what
reality
would
be
and
we
can
use
truncated,
spherical,
harmonics,
otherwise
known
as
ambisonics
as
a
way
to
represent
a
n
order
approximation
to
this
to
this
function.
G
So
if
we
consider
first-order
ambisonics
first
first,
our
first-order
ambisonics,
all
a
Masonic
sin
general
consists
of
what
are
called
spherical
harmonics
spherical
harmonics
are
a
set
of
orthogonal
basis,
functions
that
describe
shapes
on
a
sphere
or
positions
on
a
sphere.
However,
you
want
to
think
about
it
and
you
can
truncate
the
series-
and
this
basically
controls
the
resolution
at
which
you
can
describe
the
the
spatial
acuity
of
the
show,
this
the
shape
contour
of
the
of
the
of
the
pattern
you're
trying
to
describe.
If
we
use
just
first-order,
we
get
four
components.
G
We
get
an
the
one
at
the
top
as
an
omnidirectional
Channel,
and
then
we
get
these
three
directional
modes,
one
along
the
x
one
along
the
Y
one
along
the
z
axis,
and
we
with
just
these
three
channels.
We
can
then
express
directional
signals
that
arrive
from
from
all
directions,
and
so,
if
we,
if
we
wanted
to
use
just
these
four
channels,
we
can
get
a
first
order.
Approximation
of
that
which
looks
like
this
now
that
doesn't
look
particularly
great,
but
it
has
the
right
directivity
more
or
less.
G
This
is
all
fine
and
dandy
for
very
wide
diffuse
fields
where
you
have,
for
example,
like
you're
recording
this,
like
you're
out
near
the
ocean
and
you're
just
recording
a
soundscape
around
the
ocean,
something
where
the
directivity
doesn't
matter
as
much.
But
in
our
example
here
we
had
something
that
had
very,
very
sharp
directivity
to
it
with
two
very
distinct
sources.
So
for
that
we
would
need
to
extend
up
to
include
more
channels
otherwise
known
as
higher-order
ambisonics,
so
with
higher-order
ambisonics.
What
you
see
here
this
is
third-order
ambisonics.
G
We
introduced
additional
basis
since-
and
these
are
more
more
circle,
harmonic
shapes
that
can
contribute
to
effectively
a
higher
spatial
resolution
that
we
can
use
to
describe
the
scene
and
a
third
order.
Approximation
gives
us
a
much
better
fit
to
reality
and
obviously,
as
you
go
up
you
can
you
can
get
a
finer
and
finer
approximation
currently
there's
a
lot
of
systems
that
use
third
order,
but
we
pretty
much
refer
to
higher
order,
ambisonics
anything
anything
above
first
order.
G
Third
order
is
a
goal
right
now,
but,
as
you
can
see,
the
number
of
channels,
Rises
quadratically,
with
the
order.
So
you
know
given
more
bandwidth
and
given
better
compression
schemes
and
things
we
we
might
be
able
to
just
really
more
bandwidth.
We
might
be
able
to
expand
higher
orders.
But
the
point
is:
is
that
we
with
ambisonics?
There
is
a
defined
number
of
channels
for
a
given
order,
but
the
order
can
be
can
vary
depending
on
the
content
and
you
get
closer
and
closer
to
what
you
were,
what
you
actually
had
in
reality.
G
Okay,
so
how
these
systems
are
typically
rendered
is
usually
through
either
a
loudspeaker
array
like
what
you
see
here
or
through
a
virtual,
loud
speaker
array,
which
would
be
the
same
kind
of
representation
but
replacing
the
physical
loudspeakers
with
some
sort
of
set
of
H
RTF
filters
corresponding
for
each
loudspeaker.
What
you
do,
then,
is
you
take
the
ambisonics
signal
that
has
that
representation
of
spherical
harmonic
modes
and
you
project
it
into
what's
called
a
loudspeakers
space
and
the
projection
involves.
G
Typically,
it
involves
like
a
pseudo
inverse
of
the
of
the
of
the
what's
called
the
encoding
matrix.
So
you
get
a
decoding
matrix
that
projects.
Your
ambisonics
signal
into
some
defined
loudspeaker
array
like
this,
and
if
you
did
this
binaural
e
over
the
headphones,
you
would
have
a
corresponding
HRT
F
for
each
one
of
these
for
the
both
the
left
and
right
ear.
So,
in
this
case,
I
think
this
is
32
speakers.
G
So
for
32
speakers
you
would
have
64
HRT
FS
that
you
would
process
32
for
each
ear
sum
them
all
up,
and
then
you
would
get
the
the
third-order
a
masonic
or
whatever
an
order.
A
masonic
soundfield
rendered
to
the
ears.
Additionally,
you
could
also
have
you
could
also
envision.
So
this
gives
you
a
sense
of
a
sound
field.
That's
around
your
head
and
just
like,
if
you
were
within
a
loudspeaker
array,
you
could
actually
tilt
your
head
from
side
to
side.
G
You
can
turn
left
or
right
up
or
down
the
loudspeakers
would
stay
exactly
where
they
are,
and
hence
you're
actually
able
to
rotate
the
sound
field
respectfully
to
where
your
head
orientation
is.
So
this
allows
us
to
give
that
sense
that
there's
an
actual
3d
space
to
this.
So
this
is
not
head
tract
sorry.
This
is
not.
G
This
is
head
tract
in
addition
to
that,
a
lot
of
designers
that
work
in
this
space
also
want
to
have
you
know,
dialogue
or
soundtracks,
or
things
like
that
things
that
they
don't
want
head
tracked,
but
they
still
think
are
integral
to
creating
the
scene,
and
so
we
represent
that
with
this
pair
of
headphones
that
are
also
on
your
head.
So
so,
if
you
extend
the
visual
metaphor,
the
ambisonics
sound
field
gets
projected
to
these
loudspeakers
and
your
head
moves.
G
The
loudspeakers
stay
put
and
the
headphones
represents
a
set
of
non-diegetic
audio
non
head
tracked
audio.
That
would
follow
your
ears,
no
matter
which
orientation
you
have
so
that's
a
basic
overview
of
ambisonics
without
going
too
deep
into
the
math,
and
now
we
want
to
discuss
how
to
add
and
ambisonics
into
opus,
starting
first
with
the
mappings
and
then
into
the
calculations.
G
But
there
is
yeah
we'll
explain
that
in
a
second
okay
right
so
for
channel
mappings,
2,
&,
3,
there's
a
expected
number
of
channels,
depending
on
the
order
and
n
n
can
be
0
through
14,
and
this
additional
parameter
J,
which
can
either
be
0
or
1
and
we'll
describe
the
the
a
Masonic
Order
and,
like
I,
said
before.
It
think
the
number
of
channels
is
goes
up
quadratically
in
respect
to
order.
So
you
see
this
in
the
1
plus
n
squared
and
the
2j
is
the
addition
of
that
headphone
track.
G
Additionally,
if
you're
going
to
use
mixed
order,
ambisonics
mixed
order
would
be
you
may
own.
You
may
want
to
use
the
third
order
horizontal,
but
only
the
first
order
vertical,
for
example,
and
you
would
not
include
some
of
the
basis
functions
for
that
mixed
order.
You
would
just
simply
want
to
zero
out
those
channels
that
you
don't
use
and
still
send
the
full
order,
Amazon
channels.
G
The
order
of
these
ambisonics
is
ordered
by
what
is
called
the
ambisonics
channel
number
the
ACN.
This
is
a
defined
standard
from
the
from
the
people
that
produce
Tampax,
and
so
we
follow
that
as
well.
So
it
follows
a
very
straightforward
scheme
and
we
simply
addendum
ACN
by
including
the
additional
left
and
right
channel
for
the
the
optional
non-diegetic
stereo
at
the
end
of
the
ACN
channel
numbers.
G
Now
into
the
details
of
the
calculations,
in
terms
of
the
coding
details
for
the
differences
between
channel
mapping,
2
and
3
channel
mapping,
2
is
a
direct,
/,
ambisonics
annal
coding
scheme.
The
way
this
works
is
we
code
each
ambisonics
annal
directly,
and
we
have
a
variable
bitrate
allocation
for
each
of
those
channels.
More
bits
are
placed
in
the
omnidirectional
Channel
and
less
bits
are
placed
in
the
directional
channels
for
channel
mapping
3.
This
is
a
little
bit
fancier,
and
the
proposal
here
is
that
we,
because
the
sound
field
often
has
a
lot
of
coherence.
G
It's
very
often
that
a
sound
source
will
be
it's
very
often
that
the
represent
the
compact
most
compact
representation
of
your
sound
field
may,
in
fact
be
better
represented
with
a
transform
space
outside
of
from
spherical
harmonics,
either
to,
for
example,
what
I
said
before
a
loudspeaker
projection
or
some
other
arbitrary
projection.
We
offered
the
ability
of
introducing
a
transform
torque
to
the
encoder
and
a
known
as
the
mixing
matrix
and
another
transform
known
as
the
D
mixing
matrix
from
from
the
coded
streams
back
to
the
output
streams.
In
this
example.
The?
U?
G
U
vector
is
our
is
our
input
streams
which
go
up
to
C,
which
is
the
number
of
ACN
channels
with
or
without
that
extra
additional
stereo
count.
The
encoder
is
is
some
mixing
matrix
a
which
can
be
a
linear
matrix
like
this,
or
it
could
be
something
else
depending
on
implementation
details.
The
number
of
streams
you
end
up
coding
is
K,
which
will
be
the
number
of
streams,
plus
the
number
of
coupled
streams.
The
way
we
do
the
coupling
is
we
couple
starting
from
the
top.
G
We
just
couple
the
we,
as
we
assume
you're,
transforming
the
space
into
some
sort
of
coherent
representation,
so
that
you
can
take
advantage
of
coupling
of
the
of
each
pair
of
X
and
the
D
mixing
matrix
that
reproject
X
back
into
your
output
streams.
It's
called
the
D
mixing
matrix,
and
this
is
the
matrix
that
we
propose
to
store
in
the
header
so
that
the
encoder
as
its
as
its
as
the
encoder
handles
the
mixing
process.
It
will
also
store
this
D
mixing
matrix,
so
the
decoder
can
interpret
that
during
the
D.
G
G
G
C
C
Echo
you
can
you,
can
you
can
ask
to
speak
to
the
mic
and
then
will
enable
your
audio
to
the
roadmap.
C
So
it's
not
a
question
to
do
so.
The
the
channel
mappings
seem
pretty
straightforward
when
you
get
into
these
higher-order
Emma,
Sonic's
and
lots
of
channels.
Is
there
any
consideration
of
work
to
actually
make
the
encoding
more
efficient
as
there
room
to
do
that?
Besides
just
channel
mapping
yeah.
G
So
the
yeah
in
terms
of
the
encoding
efficiency,
a
lot
of
that
can
be
taken
in
by
designing
that
a
matrix
so
especially
with
very,
very
high
order.
Ambisonics,
unless
you're
dealing
with
very
sharp
directional
signals,
you're
going
to
have
a
lot
of
coherence
and
in
the
signal
and
there's
very
likely
that
you
can,
you
can
think
of
a
matrix
transform.
That
would
put
your
channel
count
into
a
much
more
compact
representation.
G
So,
for
example,
like
let's
say
you
have
like
fifth
order
ambisonics,
which
would
be
64
channels,
but
you
only
have
a
handful
of
sources,
two
or
three
sources
in
the
room
at
a
given
time.
Maybe
a
violin
over
here,
someone
talking
over
here
and
a
whale
behind
you,
and
so
you
might
actually
have
a
matrix
that
can
actually
go
from
64
to
3.
And
then
you
end
up
only
needing
to
encode
three
channels
and
then
D
mixing
would
be
the
reacts
very
expression
of
that
back
into
the
64.
G
A
So
I
think
we
had
discussed
on
the
list
briefly
that
that
we
drafted
the
original
log
opus
draft
poorly,
in
the
sense
that
that
it
says
that
anything
that
was
not
listed
in
there
any
channel
mapping
family
not
listed
in
the
original
draft
should
be
treated
as
channel
mapping
255.
If
you
don't
recognize
what
it
is
and
we
I
think
discuss
some
updates
to
that
draft
arm.
That
would
essentially
go
back
and
fix
that
language
to
say
something
more
sensible,
but
I
didn't
see
that
any
of
those
in
your
latest
ambisonics
draft.
G
Don't
I
mean
yeah
I
believe
perhaps
that
was
not
clear.
I
have
to
go
back
this
to
talk
to
me
beyond
about
you
know,
I.
Think
the
if
I
understand
you
correctly
like
you,
the
concern
is:
is
that
if
the,
for
example,
the
decoder
sees
channel
not
be
2
or
3
and
doesn't
know
what
to
do
with
it,
it
reverts
back
to
255
is
that
that
right.
A
G
A
To
255
basically
says
to
use
the
same
kind
of
channel
mapping
table
that
that
channel
mapping
family
1
has
and
then
just
pretend,
like
you
know,
just
write
out
each
individual,
each
individual
channels,
you
decode
it
and
don't
try
to
say
like
what
that
channel
means
there.
Anything
which
I
think
works.
Basically,
fine
for
channel
mapping
family
too,
but
it's
more
problematic
for
kennel
mapping,
family
3
and.
G
This
was
why
so,
if
you'd
seen
the
the
previous
draft
doc
that
we
had
sent
out
the
initial
one
that
we
proposed
had
both
the
mapping
table,
that
you
saw
in
mapping
family
1
and
this
mixing
matrix-
and
there
was
some
discussion-
I
believe,
mark
Harris
and
and
a
few
other
people
mentioned.
Oh,
you
could
actually
optimize
that
out
by
just
including
effectively
that
mapping
table
as
part
of
the
mixing
D
mixing
matrix.
So
we
removed
the
mapping
table
from
the
most
recent
doc
right,
which,
which
I
still
think.
A
So
so
we
could
do
that
or
we
could
just
you
know,
fixed
draft.
You
know
fix
RFC
70-74
what
you
mean
7845
to
say
what
I
meant
to
say
when
I
originally
drafted
it,
which
is
that
the
actual
contents
of
the
channel
mapping
table
depend
on
what
your
channel
mapping
family
is.
And
if
you
see
when
you
don't
know,
you
really
can't
do
much
with
it.
A
A
A
I
G
With
K
I
see
yeah
the
correct
the
correct
formula:
it's
it's
on
the
from
it
should
be
just
straight
from
the
the
Ann
Beck
specification,
so
we'll
make
sure
that
that's
cleaned
up
so
I'm,
sorry
about
that
yeah
one
of
those
might
be
zero
based
in
the
other
one
one
days,
one
base
I
think
that's
what
it
is.
I
think
it's
some
shift
of.
G
100
I
think
it's
correct.
So
if
we
start
from
0
K
0
would
ya
actually
I
think
it's
I
think
it's
correct.
K
1
would
be
negative
1
degree,
because
K
1
would
be
the
second
channel,
which
would
be
the
first-order
degree
negative
1,
but
I'll
double-check
I'll
make
sure
that
it
that
it's
working
corrected
and
get
back
to
you
later
today
on
that.
G
K
G
On
it
expects
yeah,
it
will
expect
the
stereo
to
be
at
the
end
on
the
input,
as
well
as
the
output,
how
we
actually
encode
it
will
will
probably
likely
send
it
to
the
coupled
stream.
So
so
the
the
the
mapping
table
will
probably
account
for
that
and
move
the
the
last
two
channels
and
so
that
they
can
be
coded
for
the
coupled
string
and
then
move
them
back.
Obviously,
okay.
I
G
No
I
think
that
the
plan
is
that
we
will
or
the
way
that
the
I
have
an
upstream
upstream
this
code
yet,
but
the
way
it
works
right
now
is
if
it
detects
if
it
does
not
detect
non-diegetic
stereo,
and
we
just
have
some
n
plus
one
squared
number
of
channels.
We
just
code
a
set
of
mono
streams,
but
if
we
have
that
know.
I
G
I
I
G
I
G
So
if
you're
I
think
the
I
think
the
I
think
the
way
that
that
the
doc
reads
right
now
and
I
think
our
understanding
of
how
we
expect
the
input
to
be
is
that
you
would
always
signal
some
n
plus
one
squared
with
or
without
that
goes
to
number
of
channels
for
either
channel
mapping.
And
if
you
were
using
partial
order,
you
would
need
to
submit
0
padded
channels.
Okay,.
G
I
A
B
F
A
You
you,
you
reviewed
the
update
draft
yeah,
I
I,
think
there
are
a
few
people
who
have
actually
looked
at
the
a
masonic
draft
who
who
we
haven't
then
been
able
to
get
to
express
an
opinion
on
the
list,
but
I
know
I've
received
a
number
of
off
list
comments
about
it
along
the
lines
of.
We
think
this
is
great.
What's
happening.