►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Recording
okay,
so
my
name
is
Martin
krung
from
Greenways
Technologies
and
vice
president
of
marketing,
with
Greenways.
A
We've
been
doing
some
work
recently
on
onnx
supporting
our
tool
chain
and
I
wanted
to
highlight
our
experiences
good
and
bad.
With
doing
that.
A
A
Our
first
product
is
actually
shipping
was
production
qualified
at
the
beginning
of
last
year,
so
we
have
customers
actually
using
this
and
seriously
and
and
putting
it
into
products
part
of
the
way
that
we
actually
control
energy
on
on
Gap.
One
of
the
ways
that
we
control
energy
on
gaap
is
to
actually
use
a
strategy
to
to
move
data
across
the
chip
in
a
way
that's
predetermined
at
compile
time.
A
So
we
don't
actually
use
data
caches
inside
the
chip
and
the
reason
for
that
is
that
data
caches
tend
to
be
extremely
inefficient.
On
stream,
workloads
You
probably
get
about
a
30
cash
hit
ratio,
which
means
that
70
of
your
loads
are
being
thrown
away,
which
is
a
lot
of
energy
being
used
up
for
no
particularly
good
reason.
A
So
what
we
do
is
we
use
software
tools
and
particularly
a
tool
that
we
call
the
autotyler,
which
is
essentially
a
memory
planning
tool.
It's
it's
a
a
tool
that
that
takes
a
model
of
the
operations
that
need
to
be
done
and
produces,
in
fact,
searches
for
an
optimal
memory
movement
across
a
memory
hierarchy,
whether
that
be
external
memory
to
the
chip
or
internal
L2,
one
memory
inside
the
chip.
We
also
have
a
compute
cluster,
which
is
multi-core
compute
cluster
in
the
Gap
8
product.
A
That's
a
cores
with
a
shared
memory
architecture.
So
really
the
most
important
thing
for
us
to
do
is
bring
data
in
and
hide
all
of
that
data
movement
behind
computation
on
the
cluster
and
keep
the
cluster
busy
to
do
that.
Basically,
we
have
a
complete
flow
of
whether
it's
part
of
that
flow
and
one
of
the
things
that
the
Autotire
takes
in
essentially
is
quite
a
high
level
model
of
the
kernels
that
need
to
be
executed.
And
then
we
have
implemented
specific
kernels
for
the
operation.
A
So
we
heavily
use
fused
kernel
operations
which
are
handcrafted
to
really
get
the
maximum
Energy
Efficiency
out
of
the
platform.
We
then
have
a
tool
called
NN
tool
which
can
suck
in
a
TF
light
and
now
an
own
out
its
graph
and
produces
that
model.
Essentially,
it's
a
lowering
tool,
that's
lowering
the
representation
in
terms
of
a
light
or
on
X
onto
the
kernels
that
we've
implemented
inside
I
wanted
to
give
some
backgrounds.
A
So
you
understand
some
of
the
comments
I'm
going
to
make
afterwards
Gap
8,
which
is
our
first
product,
only
operates
on
fixed
points.
So
quantization
is
extremely
important
to
us.
An
NN
tool
can
handle
those
quantization
steps
and
with
very
various
different
strategies
inside
it,
or
it
can
suck
in
quantization
information,
currently
only
from
tensorflow
like
and
then
use
that
those
tensor
statistics
and
some
indications
of
the
quantization
which
tensorflows
are
applied
to
apply
quantization,
that's
compatible
with
our
kernel.
A
So
the
experience
with
omnx
so
so
really
what's
gone
very
well,
and
what
I
really
appreciate
with
our
next
particular
versus
attempts
for
the
light
is
a
really
understandable
operator
set
and
structure.
There's,
there's
fairly
little
duplication
of
operators
doing
more
or
less
the
same
thing
in
a
different
way.
There's
great
documentation,
The
Operators,
are
really
really
well
documented
and
that's
really
really
appreciated
by
any
one
being
development
work
with
it,
there's
also
a
great
operator
versioning
system.
A
So
when
you
update
your
operators,
the
versioning
systems
really
appreciate
it
and
allows
us
to
to
to
import
with
confidence
across
multiple
different
versions,
which
is
a
difficult
thing
to
handle.
There
are
two
areas
that
I
really
wanted
to
say
that
I
think
there
could
be
Improvement
on
the
first
is
quantization
and
the
second
is
something
I
called
Fusion
friendliness
and
I'll
come
to
that
in
a
in
a
following
slide.
But
let's
deal
with
quantization
first.
A
So
at
the
moment
it
seems
like
an
omx
there's,
a
mix
of
some
fake
quantization
operators
and
scale
quantization
operators
and
then
there's
a
few
more
complex
ways
like
a
convolution,
I
think
and
perhaps
a
linear
layer
where
there's
actually
a
specific
quantized
implementation
of
that
kernel
and
I'm
really
wondering
what's
the
goal
here.
A
Is
it
to
express
a
quantized
graph
on
X
graph
directly
and
then
and
then
run
that,
or
is
it
to
provide
the
necessary
information
for
a
back
end
to
provide
its
own
quantization
scheme
for
the
graph
or
essentially
lower
onto
its
own
quantization
scheme,
whatever
that
might
be,
I
mean
if
it's
expressing
a
quantized
graph
directly
so
that
you
can
run
it
in
its
form
in
on
an
X?
A
That
seems
like
a
really
open-ended
subject,
because
you're
going
to
provide
quantized
operators
for
every
single
scheme
with
every
single
different
quantization
technique,
you
know,
are
you
going
to
start
support?
Sub
byte
quantization
variable
bit
width
quantization?
A
How
are
you
going
to
support
tensor
compression,
there's
loads
of
things
which
which,
which
you
know,
which
really
map
close
to
the
hardware
as
well,
that
have
really
direct
effect
or
in
in
terms
of
performance
and
Hardware
and
I
would
prefer
that
you
were
concentrating
on
the
latter
or
at
least
that
that
was
something
which
you
considered
strongly
and
if
the
latter
is
something
that
you
consider
strongly,
then
more
information
is
needed
and
currently
that's
there.
A
We
obviously
have
you
know,
parameter
statistics,
because
we
have
the
parameters.
The
activation
statistics
we
don't
have
and
tensorflow
like
currently
with
all
the
tensors
that
are
actually
brought
in
in
a
quantized
graph,
gives
you
at
least
minimum
maximum
information
on
every
single
tensor.
It
would
be
really
nice
to
get
standard
deviation
and
mean
information,
and,
and
that
with
that
we
would
we
would
be
very
happy
and
would
be
able
to
do
quite
a
bit
of
you
know:
we've
built
a
map
onto
our
existing
Quantum
quantized
operators.
A
It
would
be
nice
to
have
some
more
statistics,
particularly
having
them
in
Max
standard,
deviation
and
mean
by
Channel,
and
also
potentially
adding
some
outlier
statistics
in
terms
of
weak
and
strong
outliers.
Those
those
would
help
us
do
a
better
job
in
terms
of
the
quantization,
and
so
my
suggestion
is
that
you
add
a
statistics
metadata
to
to
every
tensor
where
you
could
do
it
just
for
the
non-constants.
A
The
second
area
I'd
like
to
cover
is
what
I
call
Fusion
friendly.
There's
there's
one
thing
which
is,
which
is
kind
of
you
know
a
big
problem
for
us,
which
is
that
we
have
you
know
optimized
fused
kernels
and
it's
very
difficult
to
get
to
highly
optimized
used
kernels
from
elementary
operators.
A
A
So
so
you
know
you
do
have
confused
operators.
You
know,
for
example,
you
have
a
GRU
operator
which
is
great
because
sensorflow
Lite
doesn't
have
it,
but
that
seems
to
be
a
move
towards
functions
containing
subgraphs
of
you
know,
being
composed
of
subgraphs
of
operators
for
a
lot
of
the
fused
operators.
That's
fine!
As
long
as
we
know
what
they
are
and
and
I
think
the
solution
would
be
to
encourage
or
in
some
way
make
it
kind
of
the
nicest
thing
to
do.
A
I
said
Force,
maybe
a
little
word,
but
but
really
encourage
the
exporter
writers
to
wrap
any
native
high-level
operators
in
the
platform
that
they're
expect
exporting
from
in
perhaps
a
function
and
then
with
a
function,
namespace
function,
name
that
somehow
indicates
to
us.
Where
did
that
come
from
what
was
it
before
it
got
turned
into
this
subgraph?
This
would
allow
us
to
choose
to
say:
okay,
we
we
have
a
good
fused
version
of
that
operator.
We
can
map
straight
onto
that.
A
So
those
are
the
the
two
points
that
I
really
wanted
to
make
and
with
both
of
those
points
implemented.
I
think
on
X
would
be
definitely
the
best
solution
for
for
graph
export
available
at
the
moment.
Thank
you.