►
From YouTube: ONNX20210324 V06 ONNXRuntimeforMobileScenarios
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
I'm
happy
to
announce
tf2onyx
now
supports
creating
onyx
models
directly
from
tf
lite
download
the
tool
via
github
or
pip,
and
use
the
dash
dash
tflight
flag
to
perform
a
conversion.
Direct
conversion
is
particularly
useful
when
a
model
is
only
published
in
the
tf
lite
format
or
when
you
want
to
utilize
optimizations
that
are
only
present
in
the
tf
lite
version
of
a
model
such
as
quantization
also
tf
lite
models
are
generally
simpler
than
their
tensorflow
counterparts,
so
tf
to
onyx
may
be
able
to
convert
the
tf
lite
version
of
a
model.
A
Even
if
conversion
from
tensorflow
fails.
The
conversion
process
itself
is
relatively
straightforward.
In
the
rewriter
phase,
we
look
for
sets
of
common
tier
flight
graph
patterns
that
can
be
efficiently
merged
into
individual
onyx
ops.
Next,
in
the
handler
phase,
we
convert
the
remaining
tf
light
ops
to
onyx.
Since
tensorflow
lite
ops
are
often
similar
to
their
tensorflow
counterparts.
We
reuse
our
existing
tensorflow
to
onyx
logic.
If
possible,
we
may
need
to
use,
cast,
reshape,
transpose
ops,
etc
to
account
for
differences
between
the
tensorflow,
lite
and
onyx
ops
specifications.
A
One
aspect
of
tf
light
that
requires
a
bit
more
care
is
quantization
in
tf
light.
Quantization
data
like
the
scale
and
bias
values,
is
associated
directly
with
the
model
tensors.
If
an
op
has
quantized
inputs
and
outputs
tf
lite
will
automatically
use
the
quantized
versions
of
that
op
in
onyx.
We
can
do
something
similar
by
converting
each
quantized
tensor
into
a
pair
of
quantized
d.
Quantized
ops,
onyx
runtime
will
automatically
substitute
in
the
quantized
version
of
an
op
if
all
the
inputs
and
outputs
are
quantized.
A
If
you
want
to
remove
quantization
from
a
tia
flight
model,
you
can
use
the
dash-d-quantize
flag
during
conversion.
Quantization
will
clip
values
outside
the
expressible
range
of
the
quantized
data
types
and
tflite
often
uses
this
method
instead
of
relu
and
relu6,
ops
tftonix
will
use
the
range
of
the
quantized
tensors
to
automatically
detect
and
re-insert
the
removed,
relu
and
relu6
ops.
A
B
B
In
order
to
minimize
binary
size,
we
include
only
the
required
operator
kernels
in
the
build
to
satisfy
the
models
that
you
wish
to
deploy.
Additionally,
you
can
reduce
the
types
supported
by
these
operator
kernels
for
further
reductions
in
binary
size.
A
custom
format
is
also
used
for
the
model
file.
B
B
B
B
The
operator
kernels
to
include
in
a
build
a
configuration
file
is
used.
The
model
conversion
script
can
automatically
generate
this
configuration
file
from
the
models
you
convert
or,
alternatively,
it
can
be
manually
created
and
edited,
as
you
can
see
from
this
example
configuration
the
syntax
is
quite
simple.
B
With
the
domain
the
offset
and
the
operator
names,
you
can
also
limit
the
types
that
operator
kernels
support
again
the
model
conversion
script
can
automatically
detect
these
types
and
generate
a
configuration
file
with
them
or,
alternatively,
you
can
specify
a
global
list
of
types
to
support
when
using
model
based
type
reduction.
You
will
generally
see
a
reduction
in
the
kernel,
binary
sizes
of
between
25
and
33
at
a
high
level.
This
is
the
onyx
runtime
mobile
usage.
You
will
take
your
onyx
models
and
put
them
in
a
directory
to
run
the
conversion
script
against.
B
They
will
be
optimized
and
an
optimized
oit
format
model
will
be
produced
along
with
the
configuration
file
that
contains
the
operators
that
are
required
and
optionally.
The
types
that
are
required.
This
configuration
file
is
used
to
build
the
onyx
runtime
package,
which
is
then
deployed
to
enable
inferencing
on
device.
B
B
As
you
can
see
here,
the
binary
size
for
a
build
with
the
operators
required
to
support
mobilenet
is
well
under
one
megabyte
if
we
enable
the
reduced
type
support
we'll
get
a
31
reduction
in
the
size
of
the
kernels
and
that
package
would
have
a
size
of
325
kilobytes
when
compressed
in
the
android
archive
that
you
would
use
to
deploy.
Your
app
in
an
api
usage
is
possible
on
android,
based
on
the
device
capabilities
and
whether
in
an
api
is
available,
the
model
execution
will
dynamically
adjust
to
use
in
an
api
where
possible.