►
From YouTube: .NET Design Review: ARM Intrinsics
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
A
D
D
My
initial
thought
was
I
didn't
like
the
approach,
but
after
looking
at
the
C++
usages
and
realizing
that
it's
that
it's
doing
byte
index
not
bit
indexing
I
think
it
makes
more
sense
to
use
element
index
here
and
it
will
probably
better
match
the
common
use
case.
And
if
people
do
want
to
do
other
indexing,
they
can
convert
to
the
appropriate
type.
H
E
A
I
D
B
L
B
D
J
G
J
G
H
E
J
So
here
in
C++,
there's
at
least
multiple
overloads
for
fuse,
multiply
every
select,
the
scaler
and
now
so.
Basically,
we
have
overloads
that
have
same
the
exercise
for
left
right
and
add
them.
Both
current
in
C++,
right,
size
and
Adams
and
left
sizes
are
independent.
They
have
basically
three
different
overloads
or
four
different
overloads,
instead
of
instead
of
two
and
the
reason
why
just
a
mark
suggested
it's
because
the
dog
have
easy
way
to
go
between
vector,
128
and
patru
64
in
our
kids
can
down
cast
from
vector
128.
J
D
It's
it's
not
quite
that
simple!
So,
with
most
of
the
instructions,
the
size
of
all
operands
are
the
same,
but
in
the
case
of
FMLA,
there's
both
a
cubit
and
an
H
bit
the
cubit
controls
the
size
of
the
result,
the
addend
and
the
left,
but
the
H
bit
controls
the
size
of
right.
So
you
can
actually
have
vector
64,
left
and
vector
128
right.
J
N
D
D
C
A
D
All
the
128
bit
also
need
one
that
take
a
64
bit
overload
as
the
third
parameter,
so,
okay,
so
so
so
for
each
vector,
64
we'll
have
one
additional
one
back
to
128.
We'll
have
one
additional
one:
I
don't
know
if
it's
how
trivial
it
would
be,
but
another
option
is
we
just
always
take
that
third
operand
is
vector
128
and
we
look
for
a
pattern
around.
D
D
C
M
Well
so
one
of
the
papers
prisons,
where
I
frequently
used
such
overloads,
is
where
I
have
to
create
a
constant
and
it's
just
replicate
those
across
all
links.
Identity,
use
a
smaller
value
set
and
then
get
the
category
get
a
zero
index
from
death,
so
that
I
only
have
a
smaller
load
or
small
movie
or
something.
M
G
H
M
L
K
D
So
I'm
going
to
introduce
a
couple
concepts
here:
real
quick,
so
on
arm.
Many
of
the
instructions
have
a
few
different
overloads.
So
there's
the
base
instruction
which
just
as
the
normal
operation
and
then
many
of
them,
also
have
versions
that
do
rounding
or
saturation,
and
so
that's
where
you
see,
for
example,
add
hi
here
and
then
round
it
at
high
they're,
essentially
the
same
instruction,
but
there's
a
bit
changed
in
in
there
in
the
underlying
instruction.
D
D
C
D
C
C
D
The
closest
concept
we
have
is
in
system
math,
we've
got
big
mole
and
we've
also
got
like
multiply
high
and
those
are
the
closest
existing
concepts
we
have
for
some
of
these,
where
big
mole
is
you're,
taking,
for
example,
two
32-bit
values
in
producing
a
64-bit
result
and
then
multiply.
Hi
is
the
same
except
you
only
return
the
upper
half
of
the
result.
Okay,.
C
M
P
D
C
Q
B
D
So
so,
there's
basically
two
parts
to
this
instruction,
similar
to
some
of
the
other
ones.
We've
reviewed
where
this
one
in
particular
deals
with
the
the
lower
half
of
the
vectors
and
then
there's
another
one
which
will
do
the
same
operation
but
write
it
to
the
upper
half
the
result,
because,
because
you're
narrowing
the
result
from
short
to
pipe,
you
end
up
with
a
vector
that
can
contain
twice
as
many
elements.
I.
M
C
D
B
C
D
D
Vector
to
the
lower
half
of
the
destination
register
and
clears
the
upper
half
well
add
agent
to
writes
the
vector
to
the
upper
half
of
the
destination
without
affecting
the
other
bits
of
the
register.
So
they
both
do
an
ad
high
operation.
They
both
do
the
operation
described
there
in
the
comment
or
in
the
summary,
but
it's
a
question
on
whether
they
write
it,
whether
they
write
the
result
to
be
lower
or
upper
half
of
the
destination
vector.
D
M
C
D
C
D
D
D
M
Q
G
R
M
M
D
M
D
O
D
S
D
F
A
C
G
A
D
N
D
C
A
G
F
F
A
D
C
C
R
R
B
B
H
C
A
B
E
R
A
D
C
M
D
F
A
A
L
A
D
This
is
because
the
there's
two
parts
one
is:
if
we
took
a
scalar,
the
J,
it
would
have
to
have
significant
work
in
order
to
understand
that
integer
values
can
be
held
in
tsim
d
registers
and
the
other
part
is
we're
actually
taking
in
the
actual
thing
taking.
It
is
a
vector
64.
It's
only
operating
on
the
lowest.
A
B
A
A
B
B
Q
B
L
D
S
H
D
M
M
D
A
D
D
B
A
G
G
C
B
A
G
B
A
D
E
B
G
G
D
G
M
D
D
D
D
R
R
A
A
Q
N
E
A
R
A
L
R
A
M
D
A
E
B
R
D
H
N
A
B
B
M
F
D
So
something
that
Igor
and
I
had
discussed
on
another
one
of
the
issues
was
rather
than
using
value
tupple.
Here
we
could
do
what
C++
does
and
define
a
custom
type.
For
example
named
we
would
have
vector
128
by
itself
for
the
single
element
and
then
for
the
two
element
tupple.
We
would
call
it
vector
128
X
2,
which
roughly
I,
for
example,.
G
N
B
N
D
I
I
B
I
D
Imagine,
at
least
from
the
perspective
of
usability,
it
might
be
easier
to
define
a
custom
type
one.
It
lets
us
extend
custom
functionality
on
that
in
the
future.
For
example,
debugger
display
things
like
that,
but
it
also
we
can
make
sure
that
it's
immutable
and
some
of
the
other
niceties
value.
Tuple
has
public
fields,
which
means
users
are
free
to
take
the
address
of
them
and
do
all
kinds
of
weird
things
with
it,
which
might
make
it
harder
to
do
various
optimizations.
O
G
I
D
I
I
mean
address
of
something
it's
it's
just
to
try.
J
A
D
P
D
P
D
A
E
P
H
A
N
B
J
D
B
D
The
fields
are
public,
so
users
are
free
to
do
to
take
the
address
of
individual
fields,
read
and
write,
individual
fields,
etc.
I
think
so
it's
just
it's
a
potential
concern
where
there's
more
places
where
a
user
could
be
doing
something
they
think
is
clever,
but
in
actuality
and
ends
up
hurting
code.
Jen.
I
D
Well,
it's
basically
the
case
of
like,
if
you,
with
with
the
vector
types
we've
got
specific,
get
get
element,
methods
that
are
able
to
explicitly
optimize.
For
example,
getting
the
x
value
of
a
vector,
128
flow
to
a
specific
instruction.
If
we
have
a
custom
type,
we
can
likewise
have
the
property
getters
for
the
you
know,
element
0,
1,
2,
&
3
that
are
specifically
optimized
by
the
JIT,
knowing
that
there
that
it's
otherwise
immutable,
whereas
with
value
double
that
will
be
much
harder
to
recognize
and
specialize
yeah.
I
N
P
I
Well,
I,
you
know,
I
would
love
to
I,
don't
know
where
that
winds
up
relative
to
other
stuff
we're
working
on,
because
you
know
I,
don't
I,
don't
know
that
you
know
without
actually
getting
to
the
point
where
we
can
register
allocate
these
things.
I,
don't
know
how
much
progress
we
can
make
on
it's.
Nothing,
not
okay!.
R
B
Q
S
A
K
A
A
N
G
A
I
B
P
M
S
S
Yeah
I
mean
this
is
what
it's
called
in
the
in
the
documentation,
but
they
point
out.
Lists
can
be
a
variable
or
semi
variable
linked
thing,
but
yeah
I
don't
have
a
better
name.
If
we
don't
know
how
we
do
it,
but
we
could.
If
we
wanted
to
just
playing
devil's
advocate,
we
could
expose
the
flat
call
versions
of
the
one-two-three
enforce
right
and
then
we
can
add
a
at
a
packed
one
later.
If
we
add
the
packed
register
type.