►
From YouTube: ONNX Roadmap Discussion #3 20210922
Description
1. Rajeev Nalawadi & Manuj Sabharwal (Intel) – Address gaps with Opset conversions across broad set of models
2. Rajeev Nalawadi (Intel) – ONNX model zoo example for E2E distributed training scenario of large models
3. Rajeev Nalawadi, Rodolfo Esteves (Intel) – Define concept of federated learning for ONNX (multi-edge training and model aggregation)
A
Okay,
hello:
everyone
welcome
to
the
third
roadmap
discussions
if
you've
attended
it
before,
I
think
it's
very
productive.
We
have
had
30
minutes
and
three
presentation
of
10
minutes
each
to
the
extent
you
can
try
to
stick
to
the
timing,
because
it's
pretty
tight
and
leave
some
time
for
questions
and
discussions.
Everything
is
recorded.
It
will
be
posted
on
youtube
and
the
slides
if
you,
if
you
agree
we'll
repost
it
on
slack.
So
the
first
presentation
is
by
my
news
and
it's
with
address
gaps
with
offset
conversions
and
both
sets
of
models.
B
Okay,
so
yeah
manoj,
I'm
manoj
from
intel,
and
I
don't
know
if
arpanad
had
joined,
but
we
are
in
the
same
team
with
rajiv
and
currently
I
own
the
ecosystem
enabling
with
owner
next
specially
with
our
partners.
So
software
vendors
working
on
the
client
side,
so
I'll
go
over
some
issues,
gap
which
we
are
working
with.
It's
large
set
of
customers,
not
just
one
customers
and
as
they
started,
adopting
onan
x
model.
We
are
seeing
some
of
the
issues
or
gaps
in
the
conversion
site
right.
B
So
one
of
the
basic
problem
we
are
seeing
is
the
offset
conversion,
as
new
use
cases
are,
keep
coming
and
there's
a
backlog.
You
can
say
once
they
deploy
in
production,
these
companies
ecosystem
partners,
they
don't
go
and
keep
updating
the
offset,
because
they
are
not
working
with
the
risk
at
pace
of
research
in
production.
So
they
are
facing
multiple
issues
on
offset
conversion
from
either
going
from
7
10
to
13,
because
we
need
13
for
quantization
which
I'll
go
through.
B
But
that
is
one
of
the
common
issue
we
are
seeing
is
once
we
get
the
model
from
our
customers.
They
are
not
able
to
con,
we
are
not
able
to
convert
or
either
they
are
not
able
to
convert
to
newer
offset,
and
we
did
try
some
workarounds,
which
was
filed
under
this
issue,
but
none
of
them
also
worked
for
us.
I
think
there
was
a
offset
converter
or
unit
test
was
added
in
1.10
1.10.1,
and
that
also
has
some.
B
Okay-
and
this
becomes
as
we
are
not
on
offset
13,
several
customers
are
not
on
offset
13.
We
are
not
able
it's
a
show,
stop
or
a
blocker
for
them
to
start
doing
the
work
for
quantization.
We
want
them
to
move
to
int
8
leverage,
good
performance
on
the
it's
available
on
many
different
hardware,
vendors,
but
they
are
not
as
they
are
not
on
13.
They
have
to
stick
with
fp16
or
fp32,
which
they
are
deploying
and
intake
has
become
a
bottleneck
for
it.
B
Adoption
has
become
a
bottleneck,
so
the
feedback
we
get
from
customers
is.
They
can
go
back
to
pytorch
code
if
they
want
to
convert
it
right
if
they're
using
pytorch,
but
as
their
researchers
have
moved
to
different
projects
or
they're,
not
in
their
own
company,
then
we
are
at
the
end.
We
are
only
with
the
own
nx
model,
and
that
means
we
are
stuck
at
that
place.
B
So
the
request
is
to
see
if
we
can
solve
this
issue
of
offset
conversion
or
have
better
unit
testing
in
the
future
on
nx
conversion
side
a
converter,
so
we
are
able
to
move
older
offsets
to
newer.
So
we
can
take
advantage
of
performance,
good
accuracy
and
also,
of
course,
intake
quantization.
B
The
other
part
which
we
also
see
like
I
called
we
if
we
are
able
to
convert
these
offset
to
13
I
have
seen,
is
these
models,
and
this
may
be
again
related
to
the
unit
test.
The
python
verification
tool
says
this
is
corrupted
model,
even
though
we
have
converted
so
first
stage.
B
It
should
have
not
converted
itself
if
there
was
an
issue
in
the
offset,
but
that
is
another
common
issue,
so
the
unit
test
or
the
all
the
tests
related
to
every
operators
or
layers
are
not
to
the
point
where
isvs
are
happy,
like
they're,
not
happy
with
with
the
own
and
excite,
and
that's
why
it
is.
B
If
I
see
currently,
most
of
them
are
on
10
or
9
offset,
and
this
has
not
just
scaled
customers
tier
one
customers
are
the
ones
we
say
they
have
big
market
share
in
the
company
for
creators
or
in
collaboration.
Video
collaboration
use
cases
who
are
adopting
on
nx,
and
that
has
become
one
of
the
bottleneck.
Plus
now
has
some
feedback
like
going
from
one
offset
to
another
offset.
B
We
have
to
go
through
another
offset
right
because
there's
some
layers
which
are
not
supported
and
that
also
isv
feedback
is
why
we
need
to
do
this
right.
I
mean
they
don't
have
expertise
or
time,
but
these
are
more
developers
who
are
integrating
ai
and
shipping
in
their
application
instead
of
their
researchers
side.
So
what
their
feedback
is
why
we
cannot
convert
from
one
offset
to
directly
offset
15
for
the
use
cases
they
are
doing.
B
B
And
this
all
comes
to
at
the
end
is
how
much
correct
correctiveness
of
these
tool
or
conversion
tools
are
there
right
now
or
how
the
unit
test
is
implemented,
because
if
we
are
having
these
kind
issues,
which
are
very
common,
we
are
seeing
in
the
ecosystem
yeah.
Then
it
will
hurt
the
onan
x
adoption
with
the
new
like
optimized
stack
right,
new
offset
will
be
coming.
They
we
want
them
to
go
to
new
upset
for
new
use
cases,
but
it
will
have
hampered
the
adoption
site.
D
All
right
cool,
so
a
couple
of
models
we
had
tried,
converting
specifically
with
the
resnet
50
model
and
also
some
of
the
bird-based
models
from
the
onyx
model
zoo
and
the
issue
that
we
had
in
such
cases
was
the
offset
conversion.
So,
for
example,
if
we
are
jumping
from
offset
6
to
offset
11,
it
goes
through
intermediate
offset
conversion,
so
it
just
goes
from
six
to
seven,
seven
to
eight
eight
to
nine.
So
if
either
of
those
intermediate
steps
are
missing,
then
in
that
case
we
are
missing.
D
Some
features
to
run
the
model
in
our
inference,
so
we
had
such
issues
where
the
onyx
adapters
were
not
implemented,
and
we
had
to
kind
of
do
that
testing
ourselves
to
resolve
those
issues.
The
second
example
that
I
mentioned,
for
example,
birth.
The
bird-based
model
is
still
having
issues
one
because
it
has
a
squeeze
operator
for
which
the
adapter
is
not
yet
implemented.
D
So
I
guess
one
of
the
biggest
issues
that
I
have
seen
with
the
version
converter
currently
is
maintenance
of
the
tool
itself
is,
as
we
keep
adding
operators
to
the
onyx
repository.
D
C
And
I
would
like
to
highlight
the
fact
that
manoj
mentioned
right,
like
the
creator
segment,
as
well
as
some
of
the
intelligent
collaboration.
A
A
So
do
you
believe
this?
You
know
you
want
a
better
coverage.
Can
you
share
the
models
that
breaks
or,
or
is
there,
in
your
opinion,
enough
models
out
there
so
that
we
can
test
whether
it
works?
Because
often
you
know
you
want
coverage,
but
it
needs
models
to
to
go
through.
B
Yeah,
I
can
bring
companies
get
the
approval
at
least
one
model,
because
it's
one
of
the
top
creator
company,
so
I
have
already
shown
them.
This
means
if
you
see
the
bug
or
I'm
I
added
to
the
bug
someone
opened,
but
I
put
as
a
ecosystem
partner
because
I
don't
have
approval,
but
I
can
try
to
get
one
model
which
is
failing.
Actually,
all
of
their
models
are
failing,
which
is
that's
why
they
are
not
moving
to
new
ops.
B
B
And
other
part
which
has
improved
between
with
1.10
is
the
fp16
conversion,
and
that
was
one
of
the
gap
between
another
other
irs
which
are
on
different
ecosystem
on,
especially
on
the
mobile
side.
They
are
deploying
fp16,
but
on
the
windows
side
they
are
not
deploying
fp16.
The
reason
was
the
conversion,
but
with
1.10,
and
they
have
some
microsoft
converter
tools
which
have
been
recently
updated.
It's
not
on
the
same
repository,
but
we
are
seeing
those
issues
have
started
getting
fixed.
It's
not
only
offset
conversion
before
it
was
fp16.
C
E
Hello,
so
the
so
you're
talking
about
the
offset
offset
conversions
right.
A
A
C
C
Samples
would
be
providing
like
enough
starting
momentum
to
showcase
the
differences,
as
well
as
the
similarities,
and
I
think
also
we'll
be
setting
like
a
path
to
show
the
distributed
training
infrastructure
with
other
libraries
and
communication
primitives
that
that
are
out
there
in
the
industry
with
training.
I
think
with
we
can
show
the
various
techniques
of
our
data
model.
C
Parallelism
and
pipeline
parallelism,
as
applicable
and
onyx
has
a
very
good
opportunity
to
showcase
these,
by
providing
at
least
some
of
the
onyx
models
for
our
training
itself
and
the
other
big
area
that
we
are
seeing.
A
lot
of
involvement
is
the
quantization
aware,
training
and
again
that
is
another.
C
Thing
which
kind
of
quantization
aware
training
scenario.
We
would
also
like
to
request
the
original
fp32
model,
so
that
will
allow
for
for
better
accuracy,
comparisons
and
also
in
the
context
of
quantization,
aware
training.
Mixed
precision
is
also
becoming
prevalent
and
we
have
another
request,
probably
for
the
next
roadmap
session
on
how
do
we?
C
C
So
I
think
main
main
thing
is
at
least
start
out
with
one
sample
moving
over
to
distributed
training
as
an
example
with
various
backend
libraries
of
collectives
and
then
quantization
over
training
and
literally,
like
the
mixed
precision
usage,
the
combinations
of
fp16
and
fp32,
or
be
float
16
and
fp32s
orientate
fp32s.
We
are
seeing
a
lot
of
these
combinations
come
out
there
when
most
of
the
model
models
are
being
considered
for
performance
improvements.
C
C
And
would
like
to
request
like
help
from
the
community
if
there
is
other
popular
categories
that
can
be
contributed,
that
would
that
would
really
be
helpful.
F
F
Is
with
transformers
oh
perfect,.
G
Oh,
I
was
just
this
is
calvin
here
from
light
matter
with
respect
to
quantized,
where
training
one
thing
systems
matters
is,
you
know,
the
precision
of
accumulation
often
needs
to
be
higher
than
you
know
the
input
and
output
precision.
So
that
might
be
something
that
we
need
to
enable
with
onyx
model
format.
C
Thank
you
calvin
for
bringing
that
up
like
a
next
roadmap
session.
We
do
have
like
one
of
the
proposal.
There
is.
How
do
we
better
reflect
the
usage
of
mixed
precisions
within
a
model
as
part
of
metadata
and
trying
to
give
out?
More
probably
like
next
roadmap
session,
will
bring
in
a
detailed
proposal
from
our
site.
A
H
Okay,
yes
yeah,
that
was
a
that
was
a
good
guess.
I'm
gonna
need
okay,
so
so
yes,
so
this
is.
This
is
a
description
for
a
deployment
configuration
that
we
are
starting
to
take
a
look
at
at
intel.
My
name
is
gabe
stevens.
I
work
with.
I
got
with
rajiv
and
other
of
the
members
of
this
of
these
forums.
So
thank
you
for
the
opportunity
to
describe
to
describe
this.
H
So
so
let
me,
let
me
first
just
very
briefly
touch
onto
the
the
actual
the
actual
characteristics
of
these
of
these,
like
training,
slash,
inference
configuration,
so
we
are
the
for
the
purposes
of
of
of
this
presentation
and
kind
of
inter-generally
calling
distributed
and
federated
learning,
but
but
the
the
topic
or
the
name
that
you
will
see
most
often
associated
with
these.
These
specific.
H
Is
federated
learning
and
like
the
idea
is
that
you
would
do
some
learning
at
the
edge?
So
so
you
you,
you
deploy
your
model
onto
onto
a
fleet
of
of
devices
and
some
of
and
those
devices
just
not
carrying
on.
They
carry
not
only
inference
but
but
they
they
have
a
training
group
themselves
which
which
it
has
implications
on
the
plasticity
of
the
model.
So
they
they.
They
enrich
the
model
with
the
data
that
they
are,
that
they
are
seeing.
H
So
the
the
advantages
of
doing
training
at
the
edge
are
like
well
documented,
and
so
so
here
I
have
a.
I
have
a
by
no
means
comprehensive
list,
but
you
know
like
important
things
include
the
privacy
or
the
technical
considerations,
such
as
latency
efficiency
or
like
use
of
use
of
computational
resources
that
would
otherwise
go
under
underutilized
and
that
sort
of
of
thing.
H
The
the
at
the
the
server
would
collect
like
a
bunch
of
of
such
notifications
and
then
then,
average
them
or
otherwise
consolidate
this
into
a
into
an
a
new
model.
That
includes
these.
These
findings
from
the
from
the
edge
devices
and
broadcast
the
new
the
new
model
for
the
edge
devices
to
to
to
continue
with
with
this.
H
So
so
so,
if
in
the
in
the
specific
case
of
onyx,
the
the
way
that
this
with
these,
this
looks
like
in
a
block
diagram
of
how
this
would
look
like
it's
similar
to
what
I
have
in
the
in
the
in
the
slide
here.
So
once
it
once
the
the
application
receives
the
model
from
from
from
an
update,
they
carry
out
their
basically
like
the
the
pre-processing
and
inference
loop
as
as
normal,
then,
and
then
they
they
somehow
updating
their
own
model.
H
With
a
training
like
a
local
training
loop
at
the
and
at
the
given
intervals,
they
they
update
their.
I
update
the
parameters
to
the
server
and
the
loop,
the
the
and
you
know
like
rinse
and
repeat
sort
of
deal.
So,
as
you
can
imagine
this,
this,
this
sort
of
configuration
is
not
this
well,
like
I
mean
this
is
not.
H
What
we're
proposing
here
is
like
some
some
modifications
I
mean
through
the
to
the
to
the
onyx
format
formatter
to
the
unix,
to
the
onyx
attending
libraries,
so
that
it
would
facilitate
this
this
particular
deployment
using
using
onyx.
So
so
we
believe
that
you
can
do
federated
learning
with
with
onyx
pretty
much
as
it
exists,
but
but
there
would
be
some
modification
that
would
make
it
make
the
the
user
experience
select
the
user,
as
in
like
the
the
developer
of
such
application,.
H
More
more
likely
to
use
to
use
onyx
if,
if
these,
these
modifications
or
these
facilities
existed,
so
in
particular
with
we
believe
that
that
a
minimal
set
of
of
additions
to
to
to
the
api
of
onyx
would
consist
of
is
basically
a
way
to
query
the
part
to
get
the
parameters
of
the
local
model,
to
update
those
parameters
and,
and
and
most
importantly,
it's
some
some
conventional
or
at
least
agreed
upon
metadata
format,
so
that
we
could.
H
We
could
very
much
like,
like
we
would
in
like
transfer
learning
annotate
what
parts
of
the
model
are
allowed
to
change
from
from
different
from
different,
like
from
from
a
learning
or
from
a
from
a
local
look.
That
would
presumably
be
trained
with
a
much
smaller
data
set
than
the
one
that
that
that
the
original
model
was
was
trained
on.
H
So,
as
as
you
can
imagine,
the
full
system
or
the
full
implementation
of
such
a
system
has
multiple
design,
design
decisions,
design,
design
points,
and
so
the
idea,
or
like
a
set
a
minimal
set
of
desired
characteristics
of
of
making
such
inclusions
in
the
in
the
the
honest
tonic.
Spec
good
good,
would
very
much
leave
a
lot
of
these
of
these
desires.
Desired
design
points
out
of
scope
so
that
the
the
application
specific
decisions
can
be
made,
and
basically
that
we
would
offload
all
of
these
decisions
to
the
implementers
of
such
a
system.
H
But
there
are
some
things
that
I
that
I
think
that
that
they
would
make
it
if
you,
if
they
were
included
in
the
in
the
unexpected,
they
would
facilitate
such
facilitate
like
kind
of
the
the
basic
infrastructure,
without
limiting
the
the
flexibility
that
would
be
needed
for
application
developers
like
the
three.
The
the
three
items
on
the
list
that
I
that
I
mentioned
before
are
such,
but
also
like
this.
H
This
notion
that
if
you
have,
if
you
are,
if
you're
distributing
your
model
across
a
fleet
of
devices,
probably
like
some
consistency
in
what
version
of
the
model
you
deployed
as
in
the
the
same
model,
would
have
to
have
undergone
the
same
optimization.
So
you
don't
you
know
you
cannot.
Oh
it's
it's
not
clear.
How
would
you
deal
with
parameter
updates
when
one
of
your
some
of
your
devices
are
running
a
quantized
model,
for
example,
versus
a
non-quantized
model
and
that
it
does
those
outstanding
questions?
H
So
so
those
those
set
of
set
of
concerns
seem
to
be
more
appropriate
to
the
two
to
two
encodings
inside
the
model,
rather
than
than
decisions
left
for
the
for
the
the
implementer
of
the
the
the
higher
order,
the
high
the
the
higher
level
system,
yeah
and
but
but
while
still
being
mindful
that
in
many
of
these,
in
many
of
these
cases,
it
would
be
very
useful
to
have
the
kind
of
a
mixture
of
of
or
at
least
not
mandate,
that
in
a
federated
federated
learning
system,
only
onyx
models
are
allowed
to
participate.
H
So
yeah,
as
long
as
the
architecture
is,
is
con.
The
the
model
architecture
is
consistent
and
the
some
edge
devices
would
be
allowed
to
have
a
tensorflow
model
versus
others
using
onyx
as
long
as
the
the
parameter
update
makes
sense
for
both
for
both
of
these
these
options.
In
any
case,
this
is
the
I
my
intention
in
presenting
this
is
kind
of
soliciting
feedback
of
whether
there
is
an
interest
in
in
from
the
from
the
spec
designers
to
pay
attention
to
this
sort
of
configuration
of
of
learning
and
whether
they
would
be
open
to.
A
And
I
made
it
to
6
p.m.
Thank
you
very
much
like
good
presentation,
any
any
questions
comments
for
this
presentation.
F
Yeah
thanks
for
bringing
this
up,
I
was
actually
curious
if
you
tried
to
do
this
with
onyx
runtime,
honest
runtime
supports
training
and
I
believe
we've
had
some
folks
who've
built
some
prototype
or
proof
of
concept
of
doing
federated
learning,
using
that,
oh.
H
F
I
I
don't
know
if
it's
on
github
or
anything,
but
we
can,
we
can
connect
and
talk
about
it
and
see
if
what
what
was
done
matches
what
you're
trying
to
do
all
right.
I
appreciate
that.
Thank
you.
A
Well,
if
not,
I
I
want
to
take
the
speakers.
You
guys
obviously
put
a
tremendous
effort
in
the
presentation
and
that's
really
much
very
much
appreciated
by
the
community
and
we'll
be
looking
forward
to
the
follow
on.
So
the
the
goal
is
to
to
basically
to
present
to
have
those
ideas
presented
to
the
relevant
sig
for
further
in
detail
and
proper
proposals
and
and
eventually
implementations
and
look
forward
to
your
continued
involvement
with
those
ideas.