►
From YouTube: Truly Excellent Digital Voice Quality: Opulent Voice
Description
Ham Expo presentation from September 2022 by Paul Williamson, KB5MU.
Transmitted voice can never be perfect. Some amateur radio voice modes, especially digital ones, are very much less than perfect. We propose and demonstrate a prototype of Opulent Voice, a higher bit rate digital voice mode that brings voice quality up to a new level.
A
Hi
I'm
paul
williamson
kv5mu
and
I
have
a
few
words
to
say
about
voice
quality
in
amateur
radio
since
nearly
the
dawn
of
radio
in
the
early
20th
century.
The
spoken
voice
has
been
one
of
perhaps
four
main
kinds
of
program.
Material
transmitted,
written
text
was
the
first
morse
code,
radio,
telegraph
and
so
on.
A
A
A
If
not
for
these
prohibitions,
we
would
never
have
been
allowed
to
share
the
amateur
bands,
as
we
always
have
informally
dynamically,
and
without
much
government
involvement.
Video
has
a
different
history.
It's
also
strongly
associated
with
broadcasting,
but
since
video
transmission
was
out
of
reach
for
amateurs
in
the
early
years,
there
was
no
need
for
rules
prohibiting
it.
A
A
A
The
speaker
creates
sound,
which
is
just
pressure
waves
in
the
air.
The
listener
detects
these
pressure
waves
using
special
hardware
called
ears
and
processes.
The
resulting
measurements,
using
even
more
special
hardware
called
a
brain.
The
subjective
result
created
by
that
processing
is
what
we
experience
as
hearing
a
perfect
voice.
Transmission
system
would
be
able
to
recreate
that
experience
with
no
degradation
whatsoever.
A
A
A
A
A
We
have
some
powerful
mathematics
for
single
channel
signals.
One
thing
we
know
mathematically
is
that
any
signal
can
be
decomposed
into
components
by
frequency
and
reassembled
without
loss.
So
we
can
meaningfully
describe
a
signal
by
its
frequency
distribution
because
of
the
way
the
human
vocal
tract
works.
A
voice
signal
typically
has
a
lowest
frequency
component
or
fundamental
somewhere
in
the
range
of
about
50
to
300-ish
hertz.
A
It
turns
out
that
human
hearing
doesn't
necessarily
need
to
actually
detect
the
fundamental.
The
listener
seems
to
hear
the
fundamental,
even
if
only
its
harmonics
are
present
so
for
voice
communication
systems.
It's
common
to
assume
that
there's
no
need
to
transmit
any
frequency
components
below
300
hertz.
A
A
A
young
person
with
undamaged
hearing
may
be
able
to
hear
as
high
as
20
kilohertz,
but
this
degrades
with
age
and
exposure
to
loud
sounds
to
12,
kilohertz,
10,
kilohertz
or
even
lower
human
hearing
is
also
insensitive
below
about
20
hertz.
So
it's
commonly
said
that
the
range
of
human
hearing
is
20
to
20
000
hertz,
a
system
that
reproduces
all
frequencies
equally
from
20
hertz
to
20
kilohertz
with
perfect
precision
would
certainly
be
considered
a
high
fidelity
system
suitable
for
the
most
demanding
music
listening.
A
Experiments
with
voice
telephony
have
shown
that
the
frequencies
that
are
most
important
to
voice
intelligibility
are
those
from
300
hertz
to
just
3.
000
hertz
listeners
judge
a
voice
as
intelligible,
even
if
all
the
components
below
300,
hertz
or
above
3000
hertz
are
removed
again,
that
doesn't
mean
the
listener
can't
tell
that
the
frequencies
are
missing
any
listener,
with
normal
hearing
will
certainly
be
able
to
notice
the
missing
information
in
an
otherwise
clean
environment.
A
hi-fi
system
with
wide
frequency
response
is
a
superior
listening
experience.
A
All
the
voices
occupy
roughly
the
same
frequencies,
so
you
might
think
that
they
would
interfere
with
each
other
catastrophically
and
the
listener
would
be
unable
to
make
any
sense
out
of
them,
but
we
know
from
common
experience.
This
is
not
the
case
at
a
crowded
cocktail
party
surrounded
by
many
conversations.
A
A
A
A
A
A
A
A
A
This
produces
a
radio
signal
that
can
be
demodulated
by
a
simple
power
detector.
The
simplest
am
receiver
and
transmitter
can
each
be
implemented
with
just
one
transistor
or
vacuum.
Tube
am
produces
a
radio
signal
that
looks
like
this
viewed
in
the
frequency
domain,
where
amplitude
is
plotted
against
frequency
that
spike
in
the
middle,
that's
called
the
carrier.
A
That's
the
signal
we
feed
into
the
variable
gain
amplifier
in
the
transmitter
that
stuff
above
the
carrier
is
ideally
an
exact
copy
of
the
microphone
signal.
It's
called
the
upper
sideband,
the
stuff
below
the
carrier.
The
lower
sideband
is
another
exact
copy
of
the
microphone
signal,
but
inverted
in
frequency,
so
low
voice
pitches
make
sidebands
that
are
close
to
the
carrier
and
higher
voice
pitches
make
sidebands
that
are
further
away.
A
Perhaps
worse,
a
lot
of
our
transmitter
power
is
going
into
the
carrier
which
doesn't
carry
any
information
at
all,
except
to
provide
a
frequency.
Reference
on
the
plus
side
am
does
not
require
any
processing
of
the
audio
signal
at
all.
The
system
is
so
simple:
it
just
transmits
a
copy
of
the
signal,
as
it
came
from
the
microphone.
A
A
Nonetheless,
even
today,
this
potential
for
high
audio
quality
attracts
a
group
of
amateurs
who
operate
vintage,
am
radios
on
the
75
meter
band
with
care
under
perfect
conditions.
They
just
sound
great,
inevitably
am
came
to
be
replaced
by
a
less
wasteful
method,
ssb,
which
stands
for
single
sideband.
A
A
A
You
may
have
heard
of
the
collins
s-line
radios,
they
were
highly
prized
and
expensive,
mainly
because
they
used
very
high
quality
ssb
filters
with
poor
filters,
which
were
more
common.
Ssb
can
be
pretty
hard
to
listen
to
even
with
good
filters.
Ssb
has
a
problem
without
a
carrier
there's
no
frequency
reference,
so
reception
depends
on
highly
accurate
and
stable
tuning.
A
This
is
less
of
a
problem
with
modern
radios
which
usually
have
good
frequency,
accuracy
and
stability,
but
if
the
tuned
frequency
is
off,
the
received
ssb
signal
comes
out.
Frequency
shifted
by
the
same
amount.
All
the
voice
pitches
are
off
when
the
pitches
are
too
high.
It
sounds
a
little
bit
like
donald
duck.
A
When
ssb
was
introduced
into
amateur
radio,
many
users
objected,
the
voice.
Quality
was
definitely
worse,
not
for
any
unavoidable
fundamental
reason
perfectly
implemented.
Ssb
can
sound
just
as
great
as
am
in
theory
in
practice,
though
ssb
signals
are
usually
not
as
good
tuning
them
incorrectly
and
listening
to
them
for
long
periods
of
time
without
fatigue
are
skills
that
can
be
difficult
to
learn.
A
If
an
interfering
signal
is
present
within
the
bandwidth
of
the
desired
signal,
both
am
and
ssb
pass.
The
interference
right
through
to
the
speaker
as
radio
conditions
deteriorate
and
signal
strength
fades.
The
desired
signal
can
be
swallowed
up
by
noise.
However,
the
degradation
is
gradual
and
relatively
easy
to
listen
to.
A
The
last
main.
Analog
method
of
modulation
is
fm
frequency.
Modulation
in
fm
the
voltage
from
the
microphone
is
used
to
vary
the
frequency
of
an
oscillator
in
the
transmitter
instead
of
the
gain
of
an
amplifier
as
in
an
am
transmitter,
like
am
an
fm
signal,
has
a
central
carrier
and
two
side
bands,
but
the
math
works
out
differently.
A
A
A
commercial
broadcast
fm
signal,
with
all
its
auxiliary
subcarriers,
occupies
200
kilohertz
in
amateur
radio.
Fm
signals
were
originally
allocated,
60,
kilohertz
channels.
This
has
been
trimmed
down
to
30,
25,
20,
15
and
even
12,
and
a
half
kilohertz
in
some
cases
because
of
overcrowding,
even
at
these
reduced
channel
spacings
fm
inherently
sounds
pretty
good.
A
A
If
the
signal
is
weak,
random
clicks
that
sound
a
bit
like
popcorn
popping
start
to
intrude
on
the
received
signal.
Then,
as
the
signal
gets,
even
weaker
noise
comes
up
and
overwhelms
the
received
signal
rather
quickly,
because
fm
occupies
a
much
wider
bandwidth.
Its
use
is
not
allowed
below
the
10
meter
band
on
vhf
and
uhf,
though
fm
is
king
for
a
number
of
reasons.
A
It
may
be
hard
to
believe
if
you
tune
around
the
bands
now,
but
our
vhf
and
uhf
bands,
especially
2
meters
and
70
centimeters,
used
to
be
overcrowded
in
major
metropolitan
areas.
Every
available
repeater
pair
was
in
use
and
during
prime
commuting
time
every
repeater
was
busy
with
conversations
there
was
a
demand
for
increased
capacity.
A
The
minimum
standard
for
success
was
to
fit
two
voice
channels
into
the
same
12
and
a
half
kilohertz
channel.
That
then
accommodated
just
one
in
the
1990s
and
early
2000s
industry
developed
several
competing
digital
radio
standards,
including
dmr
and
p25
japan,
amateur
radio
league
developed
d-star
in
2013
jesus
introduced
their
system,
fusion,
which
is
nearly
identical
to
p25.
A
A
Here's
where
another
important
mathematical
theorem
comes
into
play.
The
nyquist
sampling
theorem
says
that
we
can
always
sample
a
signal
like
this,
as
long
as
the
sampling
rate
exceeds
twice
the
maximum
bandwidth
of
the
signal.
The
stream
of
samples
captures
all
of
the
information
in
the
signal
and
the
signal
can
be
losslessly
reconstructed
from
the
samples
that
may
not
be
intuitive,
even
if
you've
studied
the
proof
of
the
theorem.
A
But
it's
a
fact.
So
if
we
want
to
sample
microphone
data
that
contains
frequencies
up
to
3000
hertz
for
standard
communications
grade
voice,
we
need
to
sample
at
least
6
000
times
per.
Second,
the
implementation
turns
out
to
be
easier.
If
we
leave
some
extra
room,
so
typically,
we
will
sample
at
8
000
times
per.
Second,
each
sample
needs
enough
bits
to
accurately
capture
the
voltage.
The
convenient
size
is
16
bits
per
sample.
A
A
A
A
A
A
A
Speaker
recognition.
The
ability
to
distinguish
one
person's
voice
from
another
is
significantly
impaired.
Listener
fatigue
is
high.
It's
just
not
in
any
way
pleasant
to
listen
to
when
the
radio
signal
degrades
enough
to
introduce
some
errors.
These
digital
voice
modes,
sound,
worse
distortion,
goes
up
and
intelligibility
and
speaker
recognition
go
down
even
on
internet
connections.
When
radio
transmission
errors
are
not
a
factor,
it
isn't
uncommon
to
be
unable
to
understand
what
the
other
party
is
saying.
A
A
A
We
do
have
room
for
this.
The
amateur
bands
are
not
that
crowded
anymore
and
it's
no
longer
difficult
to
use
uhf
frequencies
instead
of
two
meter
vhf
the
regulations,
at
least
in
the
united
states,
allow
offer
much
more
bandwidth
for
digital
transmissions
in
the
222
megahertz
band
and
every
band
above
that,
in
particular
for
a
digital
satellite
system
that
uses
5,
gigahertz
and
10
gigahertz
microwave
bands
for
the
uplink
and
downlink
respectively,
as
proposed
by
ori,
there's
plenty
of
room
for
a
hundred
of
these
channels.
A
A
A
A
A
A
We
use
the
library,
opus
implementation
libopus
and
we
tell
the
opus
encoder
that
is
encoding
single
channel
speech.
We
max
out
the
input
sample
rate
at
48
000
samples
per
second
for
full
band
frequency
coverage
and
choose
one
of
their
recommended
frame:
sizes,
20,
milliseconds
and
importantly,
we've
chosen
an
output
bitrate
of
16
000
bits
per.
Second,
that's
between
six
and
seven
times
as
many
bits
as
the
ambi-based
digital
voice
modes
use.
A
16
kilobits
is
actually
near
the
low
end
of
bitrate,
supported
by
opus,
that's
appropriate
since
we're
only
encoding
speech.
We
don't
need
high
fidelity
reproduction
of
music
to
simplify
the
prototype
design,
we're
currently
using
a
constant
output
bitrate.
This
will
probably
change
to
a
variable
bit
rate
in
the
full
implementation.
A
A
The
m17
project
is
another
effort
to
replace
the
ambi-based
digital
voice
modes
with
something
better,
but
with
somewhat
different
goals
like
us.
They
wanted
to
replace
the
mb
voice,
codec
with
something
free
and
open
and
with
better
quality,
but
they
also
wanted
to
fit
within
traditional
channel
spacing,
so
they
could
not
greatly
increase
the
bitrate.
A
They
use
it
at
its
maximum
bit
rate
of
3
200
bits
per
second
only
about
one
third
faster
than
the
ambi
modes.
On
top
of
that,
they
use
a
rate.
One
half
convolutional
forward
error
correction
code
to
protect
the
voice
bits
from
errors,
and
we've
kept
that,
along
with
many
other
design
decisions
that
we
didn't
see
any
need
to
change.
A
A
Voice.
Quality
through
opulent
voice
is
not
perfect,
but
it
is
genuinely
very,
very
good.
At
least
I
think.
So.
What
do
you
think?
You've
just
listened
to
this
entire
talk
through
the
prototype
opulent
voice,
implementation
as
madge
would
say:
you're
soaking
in
it
I'll
be
happy
to
answer
any
questions
you
may
have
in
the
remaining
time.