►
From YouTube: ONNX Training Working group meeting 17 Sep. 2019
Description
Recording of ONNX Training Working group meeting 17 Sep. 2019
A
C
C
So,
first
of
all,
you
know
in
our
last
meeting
of
I.
C
C
A
A
C
C
A
B
C
I
run
right,
we're
going
to
put
comments
and
merge
these
PRS
into
their
local
budget
to
see
those
signature
are
working
with,
while
working
for
it
for
Ian
I.
Okay
to
think
this
PR,
which
the
better
branch
and
hope
it'll
be
a
centralized
place
for
all
the
training
cells,
because
the
as
we
have
discussed
previously,
it's
hard
to
have
the
girl
Bob
you
to
all
the
training
stuff
in
Marco
Aldo
separate
his
PR.
C
C
There's
a
run,
try
this
or
graded
before.
Oh,
it's,
okay,
so
one
comment
got
from
fraud
discussion.
Is
that
it's
hard
to
see
what
the
actual
training
were
so
I
I
didn't
so
I
further
added
function
to
create
the
model
purely
for
training?
No
reason
it
will
extract
the
training
algorithm
from
the
model
and
put
it
in
another
model
so
that
you
can
can
just
visualize
it
so.
C
C
And
this
outfit,
though
reading
of
the
wedding
and
in
the
optimizer
it
gets,
the
is
a
way
to
be
updated
and
the
gradient
that
way.
The
gradient
coming
from
gradient
operation
here
is
N
squared
here
how
to
score
linear
way.
We're
getting
out
of
scores
annuity
in
the
state
of
the
secondary
momentum,
stay
on
top
of
the
linear
way
and
the
something
lighter
come
with
new
ways
and
new
signal.
The
momentum
from
color
input
and
you
get
the
new
one,
so
I
hope
it
does.
C
The
list
by
all
right
early
on
so
fair,
so
every
one
hand
tried
it.
It
doesn't
handle
every
case
it
because
240
generation
graph
view
on
it.
Some
of
that
is
programming
language
tuning.
You
need
a
compiler,
but
it
is
at
it
is
everyone
an
overview
to
how
to
use
those
newly
introduced,
the
components
to
combo
the
model.
B
B
D
C
C
C
D
D
B
D
B
D
C
F
C
E
Anyway,
yeah
I
mean
he
ignoring
the
exact
details
of
the
representation,
so
so
I
guess
the
gradient.
Where
to
look
at
this.
E
C
Yeah,
you
don't
need
to
do
the
differential
to
the
hood,
where
you
can
just
find
sub
work
of
that
and
do
the
differentiation,
because
sometimes
in
your
pipeline,
some
operators
are
not
differentiable.
For
example,
feature
engineering
cart,
for
example,
careful.
Yet
they
are
not
vegetables
but
after
those
three
to
engineering
to
finish
your
unit
build
neural
network
models
with
bit
is
original,
but
it's
just
a
subgroup
of
the
whole
Python.
C
F
Okay,
I
can
probably
explain
a
bit
here.
So,
oh,
this
is
a
journal
from
two
enabling
a
representation
of
the
gradient
graph
between
AI
for
any
sub
graph.
So
like.
Oh,
the
gradient
doesn't
always
stop
by
propagating
from
the
last
function
it
can
like.
It
can
start
in
the
middle,
and
then
it
came
back
propagate
to
a
certain
point
where
we
need
the
credit.
So
the
the
attribute
of
this
gradient
node
is
just
defining
the
starting
point
and
ending
point
of
those.
B
C
E
C
E
Is
the
constant
that
is
that
is
being
passed
here?
No,
no,
so
I
guess
the
other
way
to
look
at
this
is
there
are
two
stages
to
this
computation,
so,
first
we
need
to
construct
the
sub
graph
that
computes
the
gradients
of
the
BAP
revolution,
graphs,
so
think
of
this
as
a
kind
of
a
higher
order
function.
So
you
specify
the
Y
and
X
s
and
we
construct
a
sub
graph
that
corresponds
to
the
back
propagation
sub
graph.
And
then
you
need
to
evaluate
this
back
propagation
sub
graph
for
specific
inputs
and
outputs.
A
D
C
It's
increasingly
connected
to
here,
you
can
see
the
starting
point
is
linear
within
the
label
and
the
any
point
slope
and
airy,
and
the
three
Sonny's
the
one
two
three,
the
three
variable
here.
They
form
the
code
scope
in
the
your
model
and
we'll
just
take
everything.
Let's
go
pal
as
the
differentiate
as.
B
E
C
C
To
the
Peoria
question
why
it
does
not
connect
you
to
the
inference
square.
Well,
it's
increasingly
connected
to
the
empress'
coil
inference
squares,
because
here
the
the
graph
we
are
going
to
differentiate
starting
from
linear
way
the
label
and
ends
at
dots
value,
and
you
can
see
Starbound
include
label.
We
basically
have
been.
We
can
touch
all
the
inference,
function,
log
functions
and
funny
and
at
the
lost
value,
so
we
can
make
increasingly
connected
to
them.
D
D
C
So
this
for
allowed
for
allowing
different
inputs,
for
example,
I,
can
identify
the
software
and
use
another
badge
who
can't
do
its
gradient.
D
E
I
mean
I,
think
I
mean
I,
think
that
that
a
presentation
is
not
perfect,
but
we
I
mean
it
is
trying
to
retain
some
of
the
characteristics
and
invariance
of
the
original
graph.
So,
as
I
said,
if
there
are
eats
really,
this
node
is
capturing
two
stages
of
computation.
One
is
construction
of
the
gradient
sub
graph
and
the
second
is
the
evaluation
of
the
gradient
sub
graph.
E
If
you,
if
you
ignore
optimization
questions,
the
gradient
sub
graph
computation
can
be
thought
of
as
a
computation
that
takes
input
and
label
and
produces
the
deltas,
and
that
is
what
this
set
of
inputs
and
outputs
are
trying
to
capture
the
fact
that,
ultimately,
the
sub
graph,
that
produced
is
takes
input
and
label
and
produces
some
deltas
internally.
The
implementation
may
bar
optimization,
but
this
is
simply
directly
use
computation
that
goes
on
in
the
left
like
in
the
my
inference,
function
and
loss
value
will
get
me
directly.
E
A
F
Hiiiiii
we
want
to
chime
in
here,
I
I,
don't
think
we
want
to
overcomplicate
this,
like
we
already
spent
a
lot
of
effort
trying
to
validate
this
and
I.
Think
like
it,
I
feel
I
personally
feel
it's
time
to
bring
this
up
to
the
community
and
hear
the
feedback.
First
before
we
put
an
extra
burden
on
ourselves.
C
C
C
C
C
C
Okay,
so
you
will
see
a
simple
square
and
how
we
can
connect
gradient
into
the
imprint
Square
and
get
this
recording
of
some
operators.
There
are
much
more
examples
with
visualization
here.
You
can
read
them
and
discuss
so
we
can
discuss
if
you
have
comparable
questions
or
something
yes-
and
here
maybe
has
a
nation
for
the
attributes
you
can
see
cui
has
been
in
full
square.
C
C
C
E
A
F
E
Oh
yeah
I
mean
I
Griffith,
portulaca,
saying
earlier,
I
mean
I,
think
it's
good
to
push
this
forward
and
take
this
to
the
community
and
because
even
with
the
standard,
onyx
I
mean
infrastructure
evolved
over
a
long
period.
After
the
original
thing,
starter
dominatus
and
until
you
know
the
proposal
is
accepted.
E
D
E
A
A
F
G
D
G
F
Yeah,
so
I
think
we
still
need
to
clarify
own
expectation
here,
so
we
are
not
trying
to
build
a
runtime
engine.
We
are
trying
to
define
us
back
that
can
represent
runtime
for
training,
so
line
I
still
want
to
reiterate
on
a
clear-cut
like
there
are
some
jobs
that
that
should
be
left
on
time
and
ring,
or
example,
implementing
the
extra
gradient
up.
That's
basically
the
whole
the
essence
of
the
whole
training
from
work
using
our
auto
grad
or
Auto
diff.
F
F
G
F
I
understand
that,
like
with
you,
I
would
like
you
maybe,
like
you
I,
feel
that
you
have
the
concern
of
whether
this
thing
will
work
with
their
engine.
Yes,
that's
because,
like
we
don't
have
the
binary
that
around
the
training
engine
that
implement
this
well,
I
can
tell
you
that
our
next
one
time
is
taking
this
back
and
we're
implementing
this.
So.
G
G
G
G
G
Joe,
that's
why
I
think
that's
important
to
at
least
have
you
know
my
torch
or
tensorflow
converter,
to
see
this
pack
and
and
tell
us
whether
they
can
produce
this?
Okay,
that's
my
point!
So
maybe
next
week
I'll
come
through.
We
have
you
know
a
Gunther
I,
think
he's
the
guy
you're
taking
control
and
convert
that
into
onyx
and
I
torch.
I,
don't
know
who
is
maybe
bend
it
yeah.
That
is
a
way
to
make
sure
the
right
people
are
in
our
next
meeting,
so
they
can
make
your
posture
film
comment
on
this
right.
G
Yeah
I
think
I
believe
with
this
Sun
as
I
did
some
exercise
so
I'm.
Looking
at
the
training
people,
try
to
you
know,
do
a
trainable
model,
egotistical.
That
seems
to
be
okay.
Of
course,
you
might
have
changed
something
here.
I
may
need
to
look
into
the
changes,
but
I
don't
like
that,
but
it
shouldn't
be
a
big
change.
So
so
I
don't
see
a
big
problem
there
right
yeah.
F
G
E
F
F
F
A
F
A
F
F
A
F
A
A
A
G
C
C
F
I
think,
like
I,
have
proposed
a
procedure
to
be
like
this,
so
let's
get
a
high
level
of
grooming
on
this
journal.
One
big
PR,
so
that
to
make
sure
that
our
approach
is
correct.
Once
we
have
that
sign
up
from
the
the
committee,
we
can
go
back
and
work
on
each
individual,
PRS
and
work
on
the
details
and
try
to
get
those
some
OPRS
type
thing.
F
C
Well,
but
so
to
to
catch
this
PR
I
sent
two
days
to
marriage,
equality
and
now
we'll
make
some
let
Sarah
changes
to
think
them,
because
some
PRS
are
they
have
interaction.
I
cannot
just
protect
one
in
the
you
know
another.
If
we
want
to
merge
those
changes
back
in
me,
I
cannot
use
it
comments.
I
need
to
do
some
menu
management.
Can
you
get
manipulation
to
my
code.
C
F
A
A
A
So
I
said
also
we
just
meet
in
two
weeks
and
the
vision
gives
me
a
list
of
people
to
invite
red.