►
From YouTube: ONNX Roadmap Discussion #1 20210908
Description
1. Takuya Nakaike (IBM) – New operators for data processing to cover ML pipeline (eg: StringConcatenator, StringSplitter, Date)
2. Adam Pocock (Oracle Labs) – C API for C++ components of ONNX (to assist in wrapper for model checker functionality)
3. Adam Pocock (Oracle Labs) – Better support for emitting ONNX models from other languages beyond Python
A
Okay,
so,
as
you
know,
we
have
30
minutes
for
a
representation
of
about
10
minutes
each.
Please
leave
some
time
for
questions
and
discussions,
and
you
should
know
that
it's
also
recorded,
which
I
think
I
haven't
started,
recording
but
I'll
do
it
for
sure.
As
soon
as
I
finish,
I.
A
You
I
appreciate
it
and
we
will
be
posted
on
youtube.
So
if
you
have
colleagues
or
friends
that
are
interested
also
by
this
topic
and
could
not
attend,
and
today
we
have
three
presentations,
one
by
takuya
from
ibm
and
two
presentations
by
adam,
which
will
be
somewhat
merged
together
and
to
be
aware,
the
next
roadmap
discussion
is
on
september
17
and
european
friendly
time
of
10
am
pst
all
right.
So
let
me
now
ask
we
had
to
present
and.
C
C
I
thought
that
that
is
because
not
through
farms,
there
is
no
true
functionality
in
existing
construction
framework
to
represent
a
typical
pattern
of
data.
Pre-Processing,
especially,
I
found
that
a
current
pipeline
framework
cannot
represent
the
new
feature.
Calculation
from
the
multiple
features
and
also
so
the
onyx
blocks
are
huge
operators
to
represent
difficult
patterns
of
data
processing.
C
Next
slide
shows
the
overview
of
our
pipeline
machine
learning
pipeline
framework
called
the
data
frame
pipeline.
So
this
framework
is
already
open
sourced
on
the
github.
So
the
if
you
are
interested
in
please
look
at
it
and
in
this
framework,
so
the
you
can
define
the
machine
learning
pipeline
by
using
our
live
api
data
from
pipeline,
and
you
can
define
the
pre-processing
steps
like
this.
C
Under
each
transformer,
you
need
to
specify
the
currents,
input,
currents
and
output
currents
and
so
on
so,
and
this
framework
runs
on
the
python
when,
in
that,
at
the
training
phase
like
this,
so
the
so
I
mean
that
this
pipeline
framework
works
on
the
python
for
the
training
and
after
that,
the
model
such
as
the
xgboost
model
is
trained
by
using
the
data
which
is
which
is
an
output
of
our
data
frame
pipeline
and
our
framework
can
consume
the
already
in
converted.
C
Onyx
machine
learning
model
and
pipe
model
and
the
out
can
be
under
output.
Export
the
onyx
file,
which
includes
the
preprocessing
operator
and
the
model
operators,
and
as
you
can
and
as
you
already
know
so,
this
onyx
format
can
be
consumed
by
some
onyx
runtime,
such
as
onyx
runtime
and
provided
from
microsoft.
C
So
this
is
our
framework,
and
in
this
prototype
we
implemented
the
11
data
frame
transformer
on
python
and
mapped
to
the
onyx
operators.
C
And
another
difficulty
is
that
the
allegated
operation,
such
as
frequency,
encoder
and
aggregated
operator
operator,
so
this
operator
takes
one
or
two
columns
and
that
some
values
obtained
at
the
training
phase
is
used
at
test
prediction.
Inference
phase.
C
So
this
example
how
we
convert
the
frequency
encoder
into
the
onyx
operator
level
encoder,
so
that
this
operator
counts
the
frequency
of
values
in
a
current,
and
we
create
the
mapping
table
like
this,
so
that
in
this
case,
so
the
a
appears
three
times.
So
the
a
is
mapped
to
c
and
b
appears
two
times
so
b
max
is
mapped
to
the
two,
so
the
generating
this
kind
of
the
dictionary
on
python.
We
converted
this
mapping
data
to
the
level
and
embed
this
mapping
data
to
the
level
encoder
like
this.
C
So
in
other
words,
so
the
we
did
we,
our
approach
is-
does
not
perform
the
aggregation
operator
on
the
onyx.
Instead,
we
generate
the
onyx
operator
with
the
charged
body
properties.
C
And
also
the
last
three
one:
three
operators
are
new
operators,
which
we
pro
prototypes
so
string
concatenator
and
the
string
splitter
and
the
date
transformer.
C
This
is
the
final
style
so
that
this
is
a
preliminary
experimental
result
by
for
the
performance
when
we
convert
the
data
frame
pipeline
in
from
the
python
to
the
onyx.
So
this
is
a
speedo
to
the
python
implementation
and
the
erova
is
the
performance
when
we
learn
all
of
the
all
all
of
the
data
pre-processing
and
the
machine
learning
model
on
onyx.
C
C
So
the
as
you
can
see
that
when
we
learn
the
pre-processing
on
onyx,
we
was
a.
We
were
able
to
get
a
great
speed
up,
such
as
300
times
performance
improvement
for
categorical
encoding
and
so
on,
and
also
we
compare
the
prediction
accuracy
like
this,
and
there
is
no
was
no
much
difference
between
the
original
code
and
the
accuracy
on
the
our
data
plane
pipeline
using
onyx
partners.
B
The
the
set
of
operators
that
you
showed
in
the
table
in
the
previous
slide
is
that
the
the
data
frame
transformers
is
that
the
complete
list
of
transformers
that
we
would
need,
or
are
there
or
is
this
only
based
on
some
models
that
you
looked
at.
C
Kind
currently,
so
the
we
introduced
only
the
three
neutron
neonics
operators,
so
these
are
the
already
implemented
our
data
frame
pipeline
framework
and
the
upper
edge
transformer
can
be
mapped
to
the
existing
onyx
operators.
The
last
three
are
needed
to
introduce
needs,
introduce
touch
tooth
of
new
onyx
operators,
yeah.
A
A
D
C
I
at
this
time
so
that
we
did
not
use
the
other
calendar
so
that
we-
I
I
don't
remember
in
detail,
but
I
used
the
existing
c
date,
parser
library,
so
that
the
implementation
is
not
so
tricky.
I
think.
D
C
C
Maybe
it
may
need
some
such
functionality,
but
at
this
time,
so
that
we
did
not
pass
any
time
on
data
so
that
it
is,
it
is
used.
I
see
yeah
yeah,
yeah
yeah.
We
need
to
yeah,
I
I
I
remember
this
so
that
yeah
we
need
to
pass
some
base
time
to
calculate
the
ear
or
mass
or
and
so
on.
Yeah.
A
So
it
sounds
like
this
is
some
work
that
would
be
needed
to
be
done,
while
you
know
doing
the
formal
proposal
of
the
date
to
make
sure
that
it
covers
yeah.
Many
of
the
uses
that
the
community
needs
yeah
one
will
say
one
more
question
from
the
community.
A
D
Zoom
was
being
temporarily
grumpy,
so
I
do
not
have
as
much
detail
as
as
the
previous
presenter.
I
have
just
have
a
couple
of
relatively
quick
suggestions,
so
I'm
adam
pogba,
I'm
in
the
machine
learning
research
group
in
oracle
labs
and
we've
sort
of
been
working
with
onyx
more
recently
from
java,
as
you
might
expect
to
oracle-
and
this
is
just
sort
of
some
suggestions
I
have
based
on
the
difficulties
we've
had
working
with
onyx
emitting
onyx
things
and
interacting
with
them
on
the
java
platform,
rather
than
sort
of
based
in
python.
D
So
the
first
one
that
I
sort
of
checked
was
so
the
onyx
core
project
has
a
lot
of
functionality
in
there
like
the
onyx
project
and
like
there's,
the
optimizer
package
and
various
other
bits
have
lots
of
houses
code
which
just
has
python
endpoints
right.
It's
all
wrapped
in
python,
and
it's
not
clear
if
the
c-plus
plus
is,
is
a
valid
target
for
binding
right.
You
know,
there's
lots
of
things
in
the
onyx
project
and
all
them
very
useful.
D
I'm
focusing
on
the
utilities
because
they're
the
bit
I
need
sort
of
I
want
to
care
about
at
the
moment,
but
as
onyx
sort
of
spreads
out
and
is
it
is,
does
appear
to
be
spreading
out
across
the
ml
ecosystem.
D
We're
going
to
need
to
interact
with
it
in
other
languages
and
python
and
so
principally
like
we
would
like
the
model
checker
to
be
visible
in
languages
that
are
not
python,
so
that
I
don't
have
to
shell
call
out
to
python
as
part
of
my
unit
tests
or
when
I'm
developing
something
I
don't
have
to
ensure
there's
a
valid
python
environment
in
my
system
or
when
I'm
trying
to
deploy
it,
and
we
had
a
use
case
with
onyx
runtime,
which
percent
might
know
about
where
the
the
model
checker
would
reject
models,
but
ort
would
occasionally
segfault
when
it
consumed
them
due
to
various
issues
in
the
way
it
was
parsing
them.
D
D
So
this
model
checking
functionality
seems
to
me
to
be
core
functionality.
It'd
be
very
useful
to
be
able
to
expose
across
other
languages.
Also
there's
some
things
about
modifying
operations
like
upgrading
between
offsets.
Some
of
those
are
offline
operations.
You
might
be
okay
to
have
python
with,
but
but
the
model
checking
seems
very
useful
and
whilst
I
would
like
to
get
it
in
java,
I
do
not
expect
everyone
to
write
a
java
api
right.
D
That's
that's
too
much,
but
a
c
api
is
something
that
most
languages
can
easily
bind
to
without
a
lot
of
user
code.
So
java
has
an
automatic
or
is
getting
an
automatic
system
for
for
binding
to
see
apis
and
running
with
them.
There's
things
like
swig
and
and
lots
of
other
languages
have
sort
of
ffis
that
let
you
automatically
bind
to
libraries.
D
If
you
have
a
c
header,
they
don't
necessarily
cope
as
well
with
c
plus
plus
apis
for
various
reasons,
and
also
it's
not
clear
if
the
c
plus
plus
api
of
any
of
these
tools
is
considered
to
be
an
appropriate
target
for
binding,
which
I
think
is
is
part
of
the
real
problem
that
I
myself
have
with
this.
It's
part
part
of
the
real
thing.
D
I
think
the
situation
is
similar
for
ml
ml.net.
I
don't
think
they
have
access
to
the
checker.
I
am
not
as
familiar
with
ml.net
as
some
of
the
people
on
this
call
will
be,
so
I
don't
want
to
speak
for
them
too
much,
but
in
general,
like,
I,
don't
think
it's
that
easy
to
get
hold
of
this
functionality
from
from
other
platforms
and,
as
I
said,
see,
apis
are
better
at
interop
than
c
plus
apis,
and
you
can
also.
D
Bind
them
if
that's
right,
writing
that
will
require
some
effort
and
some
design
effort
and
some
thought
and
construction
and
maintenance
burden
right,
which
might
be
too
much
if,
if
that
is
too
much,
then
is
it
possible
to
sort
of
denote
which
of
the
c
plus
apis
would
be
stable?
Entry
points
will
be
things
that
we
can
easily
access
and
we're
okay
to
bind
from
other
languages,
because
they're
not
going
to
change
out
from
under
us.
D
So
this
is
it's
particularly
a
point,
because
it's
very
common
to
bind
super
plus
apis
in
python,
using
pyrap
or
pi
bind
or
something
so,
for
example,
tensorflow
does
this
and
linux
runtime
does
this
and
they
get
extra
functionality
because
they
bind
directly
to
the
api
and
get
access
to
all
the
c
plus
internals,
rather
than
going
through
a
single
header
file
that
provides
performs
a
sort
of
a
barrier
that
is
the
coded
two
interface,
so
it
is
sort
of
a
question
of
defining.
What
is
the
useful
coded
two
interfaces?
D
Is
it
just
like
the
whole
thing
and
there's
no
subdivision
between
python
and
c
plus
plus?
It
would
be
useful
if
there
was
a
subdivision.
It
would
be
even
more
useful
if
that
subdivision
was
via
a
c
api
that
everybody
else
could
use
as
well.
But
you
know
each
of
these
have
different
development
costs
and
may
well
not
be
of
interest
to
the
community.
D
So
that's
all
really.
I
have
to
say
about
this
specific
point.
I
have
another
slide,
which
is
about
sort
of
dealing
with
onyx
from
from
other
languages
of
python
as
well.
I
can
roll
directly
into
that,
or
we
can
take
questions
about
this
bit
specifically
if
people
are
interested.
A
D
I'm
already
maintaining
three
java
open
source
machine
learning
projects,
so
I
would
be
happy
to
participate.
I
don't
think
I
could
write
the
whole
thing
myself
and
to
some
extent
I
am
a
java
programmer
and
I
can
write
some
c.
I
am
not
a
c-plus
plus
programmer
and
I'm
especially
not
familiar
with
modern
c
plus
plus
so
there's
some
aspects
to
that
where
I
just
I
don't
really
have
the
background
and
do
not
have
the
time
to
get
up
to
speed.
On
that
background.
So.
A
D
So
so
I
can,
I
can
talk
to
to
a
wider
people
in
the
company
and
see
if
people
are
interested
right.
It
doesn't
seem
worth
trying
to
galvanize
a
large
effort
on
this
like
unless
there's
any
interest,
and
we
didn't
want
to
try
and
fork
anything
because
that's
not
what
benefits.
D
D
There
we
go
so
this
is
about
exporting
onyx
models
and
emitting
them
from
other
languages
in
python,
so
ml.net.
So
I
talk
about
ml.
on
it
a
little
bit
here.
I
have
not
been
in
direct
contact
with
the
developers
on
this
topic.
I've
just
looked
through
their
code,
so
I
do
not
wish
to
speak
for
microsoft.
Anyone
correct
me
so,
but
ml.net
and
trivial
library
are
two
two
projects
which
I
know
of
that
export
onyx
models
from
languages
that
are
other
than
python,
so
c-sharp
in
case
of
ml.net
and
java.
D
In
the
case
of
trivia,
both
of
them
have
a
pile
of
onyx
related
helper
functions.
D
You
can
see
microsoft's
package
for
it
there
and
you
can
see
our
package
underneath
both
our
package,
in
particular,
is
currently
under
active
development
and
is
expanding
to
cover
the
set
of
models
that
we
support,
we're
in
the
process
of
doing
this
is
just
the
way
the
roadmap
overlapped
with
our
development
cycle
and
all
this
functionality
or
much
of
this
functionality,
as
far
as
I
can
tell,
is
also
in
this
onyx
converter,
common
project
used
in
on
xml
tools
and
a
few
other
places.
D
That
means
that
there's
three
different
implementations
of
basically
identical
functionality.
I
I'm
not
clear
on
exactly
what
the
ownership
I'm
gonna
use
it
under
the
onyx
project.
D
Our
java,
one,
as
I
said,
is
still
very
much
under
construction,
but
it
seems
relatively
wasteful
to
have
three
projects
that
all
let
you
interact
with
onyx
models
and
generate
them
appropriately
and
try
and
ensure
that
their
invariants
are
verified
so
that
you
produce
nodes
with
the
appropriate
attributes
and
you
don't
construct
malformed
graphs.
D
As
I
said,
I
would
like
the
model
checker
to
validate
that
I'm
not
producing
malformed
graphs,
but
we
would
also
like
our
code
to
prevent
us
from
producing
malformed
graphs
by
ensuring
that
those
are
type
errors
or
other
kinds
of
errors.
D
So
it
seems
very
strange
to
me
that
there
are
three
different
implementations,
none
of
which
share
any
code,
and
so
each
one
of
them
could
have
bugs
in
how
it
exports
onyx
models
when
we
fix
any
one
of
those
bugs
nobody
else
benefits
apart
from
the
project
that
depends
upon
that,
and
that
seems
to
be
a
waste
of
effort.
D
In
my
opinion,
admittedly,
the
python
one,
I
think,
is
far
and
away
the
most
used
one.
So
that
sees
the
most
development
effort.
It's
probably
got
the
fewest
bugs
r1
is
still
under
development.
I'm
not
clear
on
exactly
the
status
of
ml.1,
but
as
a
lot
of
this
stuff,
like
the
onyx
converter,
common
project
has
a
bunch
of
things.
You
see
that
it
uses
sorry,
it
uses
a
bunch
of
things
from
the
onyx
project,
the
main
one.
D
If
we
had
a
sort
of
common
api
that
everyone
was
willing
to
use
that
we
could
buy
and
see,
then
we
could
all
use
that
same
api,
that
api
could
end
up
being
part
of
the
spec.
It's
the
approved
way
of
generating
nodes
and
graphs,
and
that
means
there's
only
one
place
to
validate
them.
D
That
seems
like
it
would
be
beneficial.
There
would
be
work,
definitely
in
migrating
all
the
different
use
cases
on
top
of
this
common
api,
but
then
we'd
all
be
working
on
a
common
api
which
would
hopefully
let
us
share
some
strength
across
that
and
make
it
easier,
and
it
would
make
other
projects
which
are
trying
to
write
onix
models
outside
of
java
or
c-sharp
or
even
outside
of
ml.not
attribute.
D
Our
package
is
certainly
reasonably
tightly
bound
into
how
trivial
views
the
world
metal.net
is
also
relatively
tightly
backed
to
how
ml.net
views
the
world
and
the
onyx
converter.
One
is
kind
of
bound
into
how
psychedelic
views
the
world
at
least
a
little
bit,
so
it
might
be
better
to
have
an
implementation
that
we
could
share
across
all
of
them
and
then
other
machine
learning,
libraries
in
other
platforms
or
other
packages
with
other
paradigms
have
a
common
language
with
which
to
emit
onyx
models.
D
B
Hey
adam
from
from
what
you
said,
am
I
hearing
right
that
you
think
onyx
converter
common
might
have
the
core
set
of
functionality
that
you'd
want.
D
I
think
it
probably
does.
I
am
not
especially
familiar
with
it.
I
only
sort
of
hit
it
every
so
often
so
as
part
of
my
as
part
of
the
work
we're
doing
to
add
onyx
export
functionality,
I'm
basically
looking
at
how
onyx
models
are
exported
via
our
xml
on
xml
tools
or
ml.net,
because
the
the
documentation
in
the
onyx
project
is
not
quite
specific
enough
really
for
me
to
quite
understand
how
it's
used.
D
Particularly
I've
been
looking
at
the
decision
tree
stuff
recently
and
that's
I
feel
like
that-
could
probably
do
with
some
wording
clarifications,
because
I
I
really
had
to
go
and
look
at
the
python
code
to
try
and
figure
out
what
exactly
it
was
doing
and
how
everything
was
bound
together,
possibly
because
decision
trees
are
such
an
alien
concept
to
the
sort
of
tensor-based
view
of
the
world
that
onyx
generally
has,
but,
but
I
think
that
a
lot
of
that
functionality
does
exist
in
there.
D
D
We
all
need
that
because
ponix
requires
the
names
are
unique
and
there's
sort
of
top
topology
stuff
in
there
about
managing
different
graphs.
If
you're
patching
graphs
together.
That's
something
that
again,
we
all
need
like
if
you
do,
especially
if
you're
dealing
with
ensemble
models,
especially
arbitrary
ensemble
models,
where
you
just
have
a
vote
on
top
of
a
variety
of
different
classifiers.
D
So
I
think
that
a
lot
of
it
is
there
it's
just
getting
it
exposed
in
the
right
way
to
other
places,
and
so
it's
easily
consumable
from
other
projects.
I
mean
you
know
the
ml.net
is
going
to
have
its
converter.
D
Java
onyx
converter,
because
any
effort
here
will
take
longer
than
our
release
cycle
for
when
we
want
to
have
onyx
support.
But
I
think
if
we
all
start
to
work
together,
it
might
be
beneficial.
D
Yeah,
but
so
I
I
I
misspoke
earlier,
there's
some
other
stuff
that
it
uses
from
onyx,
which
is
is
c
and
has
some
sort
of
wrappers
around
the
product
of
generation.
There's
also
a
helpers.pi
in
the
onyx
project,
which
is
sort
of
aids
with
the
generation
of
the
protobots,
but
is
actually
relatively
uncommon,
with
the
other
things
in
the
other
languages,
and
then
there's
the
ionic
converter
comma,
which
is
has
a
sort
of
pythony
view
of
it.
But
all
three
of
these
packages
have
something
that
does
scoping
something
that
does
naming.
B
If
we
want
it
to
be
kind
of
a
common
tool
that
is
used
by
different
folks,
I
think
it's
probably
under
the
architecture.
Infrastructure
sig.
Just
like
you
know
the
model
checker
and
some
of
the
other
kind
of
commonly
used
tools.
A
So
I
don't
have,
I
know
I
know
there
are
folks
that
do
on
java
and
and
are
interested
in
that
so
not
here
in
the
corner,
but
maybe
I
can
put
them
in
touch
with
you.