►
From YouTube: Code Generators and Much More (Part II)
Description
Thanks to @nbhuiyan who fixed the first 12 minutes of audio that were missing from the initial video! Here is the newly re-recorded and stitched together Compiler Vitality Talk -- Code Generators and Much More (Part II).
A
A
A
A
A
Four
by
the
data
item
were
aligned
at
least
multiple
of
four
addresses,
and
so
on
so
forth.
Then
why
I
brought
up
this
data?
Atomicity
topic
is
all
because
Java
has
a
idiom
called
array
copy
array.
Copy
semantic
is
defined
in
terms
of
the
array
element,
and
you
can
you
can
imagine
for
a
integer
array.
Then
the
array
copy
semantics
is
a
defined
copy,
one
array
for
by
the
element
a
time
and
they
they
are
because
they
are
naturally
aligned.
Already
they
guaranteed
to
be
data.
A
A
A
A
All
this
concept,
I
will
go
into
details
in
later
slides
here.
I
just
briefly
mentioned
what
they,
what
the
three
main
architectures
differ
or
the
same
in
these
properties,
for
example,
the
I
cache
all
three
platforms
right
now
is
coherent
and
T
cache
also
coherent,
but
they
were
differ
in
right
back
or
right
through
right
at
okayed
on
no
right
a
decade
and
also
a
outer
level
caches
to
an
l-3
whether
they
are
inclusive
or
victim.
They
are
different
as
well
going
to
the
next
slide.
A
Now
we
are
going
to
describe
coherent
verses
in
noncoherent
here.
I
have
an
example
of
instructions
as
Kealing
on
processor,
0
and
processor
1,
and
that
compare
what
coherent
and
noncoherent
differ
in
behavior,
and
here
the
I
mark
the
processor,
cache
and
memory
as
the
different
layers
of
layers
of
hardware
in
system.
However,
the
memory
here
you
can,
you
can
imagine,
is
a
outer
level
cache
or
real
memory,
as
long
as
is,
is
a
beyond
the
current
cash.
It's
fine!
A
9
is
plus
2x
and
later
on,
processor
1
as
killed
it
99,
plus
2x.
It
is
coherent,
you
can,
you
can
see
the
state
transition
in
processor,
0
cash
is
going,
the
X
cache
line,
value
going
from
0
initial
value
to
9
and
go
into
no.
No
here
means
the
cache
line
is
not
going
to
be
present.
In
that
cache,
it's
going
to
be
invalidated.
A
Why
is
invalidated
because
is
invalidated
by
the
later
exclusion
of
X
multiplication
on
p1?
That
means
is
coherent,
coherent
is
automatically
invalidated
in
the
neighboring
peas,
zeros
cash,
and
in
the
memory
side
the
x
value
is
going
to
transition
from
initial
zero
to
nine
and
then
eventually
is
108
and
in
the
cache
of
the
P
one
is
from
zero
at
around
nine
to
108.
So
it's
coherent.
On
the
other
hand,
when
it's
noncoherent
things
is
going
to
be
very
absurd.
A
From
now
I
was
assuming.
Data
cache
has
to
be
coherent
in
multiprocessor
noncoherent.
The
data
cache
only
exists
in
the
past.
In
a
uniprocessor
system.
You
can
do
noncoherent
the
cache
there
you
can.
You
can
manage
the
data
you
push
down
to
the
memory
you
can
do
that,
but
the
only
multiprocessor
multiprocessor
system.
If
data
cache
is
noncoherent
you
can.
You
cannot
imagine
how
it's
going
to
work.
On
the
other
hand,
struction
catch,
it
can
be
feel
incoherent,
because
instruction
cache
is
modified.
Very
rarely
typical
program.
A
A
A
You
can
see
initially
level
1
level
2
are
empty
and
then
you'll
read
X
then
actively
to
be
brought
into
a
level
2
and
level
1,
and
then
you
you
need
Y
and
both
x
and
y
are
brought
into
a
level
to
a
level
one.
So
it's
inclusive
there
and
then
you'll
help
for
some
reason:
X
is
evicted
from
level
1.
You
can
stay
in
level
2
still
because
in
level
to
have
a
bigger
capacity,
you're
evicted
from
level
1,
it
can
stay
in
level
2
left
line.
A
But
if
you
still
inclusive,
you
only
have
why
level
tool
containing
Y?
That's
fine!
Then
you
have
a
pet
invalidation.
What
happens?
Is
you
have
a
cohesive
traffic
from
external
world
to
level
2?
You
invalidate
evicted
Y
from
level
tool
that
as
long
as
you're
inclusive
cash
means
because
you
are
evicted
from
lower
tool,
you
need
to
do
a
packet
invalidation
to
level
1.
So
if
you're
going
to
do
it
back
religion,
why
in
level
one
so
level
one?
Why
is
evicted
invalidated?
A
A
A
If
you
are
so
initially,
it's
containing
a
and
victim
is
the
CMD
now
level
one
needs
that
the
possessor
need,
the
content
of
cache
line,
P
and
conflict
basis.
A
what
happened
there
is
P
is
brought
into
level
1,
but
not
bought
into
the
level
2
because
your
victim,
but
at
the
same
time
a
is
evicted.
A
is
video
where
is
going
to
is
going
to
the
victim
cache.
So
it's
going
to
piss
brave
the
P,
because
P
initially
is
a
are
you
arrived,
is
least
recently
used.
A
Basically,
it
is
going
to
be
evicted
so
a
it
will
praising
P.
So
becoming
C
and
B
and
a
C
now
become
a
IO
because,
if
least
recently
used,
you
see
in
ever
used
a
and
Beals
right
and
then
later
on,
you
need
to
take
again
the
process
and
meet
again.
What
happened?
You
got
two
egg
exchanges
so
be
need
to
kick
out
and
a
is
brought
into
the
cache.
So
what
happened
here
is
because
the
P
is
kick
out.
You
need
to
go
to
the
victim.
A
Cache
is
push
out
at
the
same
time,
a
is
brought
into
a
level
one.
So
is
it
going
to
get
into
level
one?
So
it's
basically
doing
a
swap
and
C.
We
may
are
you
okay,
so
this
to
kind
of
catchy
data
architecture.
What
the
the
trade-off
here
is.
It
basically
been
visible
as
the
capacity
for
victim
cache.
You
can
imagine
a
level
two
cache
a
a
level
two
eleven,
whatever
the
external
level
cache
is
a
capacity
expansion
to
the
inner
level
cache.
A
So
you
have
a
512
kilobytes
of
level
two
and
32
kilobyte
of
level.
One
is
bit
pretty
much.
You
have
the
cache
capacity,
it's
a
32
kilobyte
apart
112
kilobyte,
but
on
increasing
side
you,
your
capacity
is
pretty
much
dominated
by
the
external
level
of
cache.
Then
you
need
to
pay
the
cost
of
bandwidth
for
victim
cache.
You
need
to
snoop
for
the
consistently
for
the
coherency.
You
need
to
snoop
the
backup
more
more
pathways,
because
you
don't
know.
A
For
example,
you
have
the
external
coherency
traffic
coming
in
to
invalidate
a
for
example,
then
you
don't
know
whether
the
a
is
in
the
level
2
or
level
1,
so
it's
going
to
typically
is
going
to
snoop
back
up
in
parallel
to
both
cache.
So
if
there
you
will
pay
a
higher
cost
of
the
bandwidth,
okay
and
next
slide
and
write
backwards
right
through
the
example.
Here
is
right
back
right
back
that
shot
and
a
short
description
of
right
back
with
the
right
through.
A
Basically,
your
data
is
going
to
be
the
external
here
in
external
level,
cache
or
memory
is
going
to
get.
The
value
is
by
eviction
of
the
in
inner
level
or
is
as
a
part
of
the
right
itself.
If,
if
as
part
of
the
right
itself
is
right
through
if
later
on
as
part
of
the
eviction,
Lange
is
a
right
back,
so
example
here
ability
you'll
do
a
on
p0.
You
do
ye
go
X,
plus
one
I'm
talking
about
the
cache
line
containing
X.
A
Only
here,
I
didn't
help
by
Y
at
all,
so
you
can
see
X
becoming
because
the
p0
require
X,
so
X
is
brought
into
the
cache
0.
Is
there
and
later
on,
p1
need
F,
doing
X
plus
99,
then,
as
part
of
the
store
operation,
the
p0
cache
is
going
to
be
not
is
evicted
there
because
it
is
invalidated
and
nine
a
p1
will
contain
the
x
value
of
99
there.
A
A
Going
to
next
side
and
also
write,
allocate
versus
not
write
allocated
so
the
simple
shot.
The
described
description
here
is
whether
your
cash,
you
are
allocated
a
line
for
the
written
cache
line
as
part
of
the
right
along
because
it
catchy
typically
is
going
to
be
populated
when
you
do
lead,
but
as
a
right,
you
know
allocated
cache
line
when
you
do
a
right
along.
A
A
Write
is
the
typically
is
you
write
things
you
don't
need
it
in
the
near
future
that
the
right
is
relatively
smaller
in
smaller
quantity
as
to
read
so
when
you
were
to
write
allocate,
the
recent
data
will
remain
closer
to
the
CPU
and
for
know,
write,
allocate
you
your
with,
and
data,
don't
trash
the
cache.
So
that's
a
trade-off
here
one.
Is
we
mainly
closer
another?
Is
you
don't
you
don't
tragedy?
You
don't
visit,
you
don't
compete
for
the
capacity
of
the
cache
you
follow
with,
and
data
basically
yard
the
inner
level
of
cache.
A
The
capacity
can
be
remained
for
the
read
data.
Okay.
So
that's
the
trade-off
here
and
going
to
next
slide.
Sookie
I.
We
showing
the
cache
architecture
on
different
CPUs
you
can
see
for
Tagalog
Tagalog
is
the
scanning
app.
There
are
a
lot
of
flavor
of
skylake.
The
tagalog
s
is
the
bigger
one
in
that
flavor
there
a
level
three
level.
Three
cache
we
become
victim,
cache,
not
inclusive.
On
other
skylake,
smaller
character,
laptop
version
of
skylake,
the
level
tool
actually
is
256
kilobyte.
A
Then
the
level
ii
actually
is
inclusive
for
skylake.
This
version
of
skylake.
You
can
you
can
understand
why
they
become
victor
us,
because
you,
if
people
in
this
flavor
level,
3
cache,
is
inclusive,
then
your
your
level
3
cache
in
capacity.
You
can
imagine
you
have
28,
you
have
twenty
eight
cores.
Each
core
is
a
one
megabyte
level
two
and
it
is
inclusive.
Your
39
megabyte
level
3
both
of
them
need
to
contain
include
the
28
megabyte
of
level
2
right
because
is
shared
so
busy.
A
You
are
you
level,
3
cache
the
total
capacity,
not
even
twice
of
the
total
level,
2
cache
capacity.
So
it's
not
picking
up
Venga
level.
3
cache
your
value
is,
it
is
reduced
significantly
so
editing
in
this
configuration.
The
power
side
is
always
a
victim
from
his
history
history
or
from
ten
years
back
the
level
with
her.
A
She
is
always
the
victim
cache
Lee
because
they
have
such
big
11:3,
so
they
are
inclusive
and
they
are
inclusive
and
the
right
back
right
at
OK'd
is
a
mix
every
year
on
sky
leg
is
a
right
back
right
at
okay
and
the
power
9
is
right:
flue,
no
right
at
okay
at
all
on
a
d'caché,
so
the
peak
has
basically
you.
If
you
only
do
you
write
the
data,
never
in
the
cache
is
only
made
to
the
lower
two,
and
three
is
all
the
way
through
is
right
if
inclusive,
so
it's
right
flu.
A
So
when
you
do
a
write
on
Lee
is
going
to
write
to
Akash
right
to
a
2
right,
2,
L
3,
all
the
way
out
at
one
go
not
as
a
part
of
the
addiction,
and
this
have.
This
has
implications
to
our
location,
to
the
physical
change.
Later
I
will
show
you
going
to
next
slide
and
now
is
going
to
I
cash,
I,
gotcha,
coherency
and
Similac.
What
Seymour?
Actually?
Basically
you
when
you
to,
because
in
a
TT
runtime
you
you
do
frequent
multiplication
to
is
relatively
frequent
instruction
modification
as
opposed
to
other
program.
A
They
do
much
instruction
modification,
so
in
order
to
do
code
patching
the
instruction
modification-
you
always
you.
Certainly
you
assuming
what
you
modify
is
aligned.
You
can
do
the
data
atomicity.
If
you
are
not
data.
Atomicity,
you
hand
you
the
the
the
instruction
as
a
part
of
data.
It
will
not
admit
it's
not
integral
to
be
written.
Then
your
soldering,
your
view
is
not
work.
This
is
a
assumption
for
sure.
In
addition
to
that
data,
atomicity
assumption,
we
will
have
two.
B
A
Things
here
requirement
for
the
code
patching
to
work
is
you
need
to
take
care
of
I
cache
coherency
because
I
catch
on
certain
processes
are
not
coherent,
II,
so
so
I
cache
coherence,
basically
governing
when
your
modification
will
be
seen
by
other
processor.
You
do
the
modification,
your
neighboring
processor
pick
it
up
or
not
when
you're
going
to
pick
up,
that
is
Ayakashi,
coherency
and
see
mode
actually
is.
A
The
best
specification
is
a
concurrent
modification
and
execution
because
it
typically
when,
when
we
do
other
language,
the
instruction
modification
and
that
instruction
to
be
executed,
it's
not
happening
concurrently
is
typically
you
compiler
generate
code,
and
then
you
send
that
code
to
be
accurate.
That
is
a
two
dating
Tuesday
evening,
but
for
cheat
code
patching
is
not
to
say
it
concurrently.
You
have
multiple
sled,
although
one
thread
is
doing
a
modification.
The
other
threads
are
probably
right
now,
actually
executing
that
instruction,
so
that
is
governing
in
this
sea
mode.
A
This
both
this
Akashi
coherency
and
Similac
spec,
a
processor
implementation
specific
and
our
one-time
need
to
know
so
if
I
cut
is
not
coherent
with
respect
to
data
cache,
what
you
need
to
do
one
timely
to
sync
it
up.
Basically,
when
you
do
the
modification,
because
here
we
nowadays,
we
always
assuming
Howard
architectures,
not
one
human
objection,
it.
A
We
really
need
to
have
a
clear
definition
of
what
really
have
an
undefined.
So
right
now
on
a
power
side,
the
halfway
relatively
clear
definition
in
actually
in
the
next
version
of
architecture
coming
out
and
on
X
and
V
right
now,
busy
is
either
bright,
chart
or
try
an
error.
You
try
something
up,
however,
it
motor
it
works.
A
Motor
is
what
you'll
do
to
modification
is
really
still
behave.
It
only
gives
the
this
kind
of
concurrent
modification
and
execution
the
the
situation
is
lipid
world
in
one
dynamic
one
time
it
was
then
in
debugger,
because
debugger
typically
is
debugger
help.
This
division
as
well
you'll
have
concurrent
modification
in
the
execution,
but
in
debugger
scenario
typically
uses
the
world
is
stopped
where
you
do
the
modification,
okay,
next
slide,
and
now
here,
basically
the
cache
architecture
relevant
to
our
code
generation,
as
I
mentioned
in
school,
basically
saying
I'll
casually
transparently.
A
You
know
you
don't
need
to
care
about
cash,
because
everything
you
transparent
is
ordinary
work,
but
it
certainly
is
not
from
a
performance
perspective,
for
example,
the
cache
line
size.
You
have
the
you
have
the
trade-off
of
memory,
bandwidth
versus
your
spatial
locality,
the
trade-off
there.
When
you
have
256
byte
cache
line,
you
expect
a
lot
of
data
there.
You
are
going
to
you,
you
you
are
using
the
first
buy
that
you
probably
expect
to
use
the
next
byte
and
it's
brought
into
the
cache.
A
You
are
going
to
respect
to
use
a
lot
from
from
that
cache
line.
But
if
you
are
not
using
a
lot
of
from
that,
cache
line,
you'll
basically
waste
a
lot
of
memory
bank
because
you
you're
brought
in
256
byte,
but
you
only
use
the
4
by
the
for
example.
Then
you're
busy
you,
you
you're
wasted,
not
more
memory.
Bandwidth
versus
your.
A
A
Give
rise
to
a
false
contention
causing
unnecessary
contention
if
you
can
break
it
into
different
cache
line
lengths
as
a
no
contention
there
that
the
false
sharing
this
is
very
easily
is
the
ten
times
of
performance.
If
you
have
photo
sharing
going
on
because
of
you
when
you
help
false
sharing
going
on
the
data,
even
if
you
have
what
is
called
a
called
cache
intervention
vision,
you
can
hang
your
data
from
your
processor
to
catch
it
to
your
neighboring
processors
called
cache.
A
That
is
the
still,
although
is
a
faster
than
memory
coming
from
memory,
is
still
hundreds
of
cycles.
So
it's
easily
ten
times
as
long
and
in
terms
of
Java
I.
Think
the
the
we
have
the
energy
ala
annotation
to
request.
This
data
need
to
be
your
own
cache
line
that
there
is
such
a
notation
to
avoid
the
false
sharing,
but
I,
don't
think
we
j9
currently
honor
honor
this
annotation.
We
don't
do
that
and
it's
also
relevant
to
catch
up
share
relevant
to
Tirupathi
curry.
A
Now
we
our
three
code
generators.
We
have
different
configuration,
whether
the
threat
local
heat
is
clear.
It's
Betty,
clear,
clear
or
is
not
very
clear.
The
trade
up
here
for
Patrick
clearing
is,
you
have
a
path
length.
Trade-Off
is
because
Java
has
a
semantics.
When
you
do
new
object,
the
object
need
to
be
initialized
with
zero
right.
A
So
particularly
clearing
basically
says
you.
The
whole
flat
logo.
Heat
is
initialized
to
zero,
to
begin
with
and
and
is
doing
a
messy
way
and
and
press
on
to
this
week
away,
for
example,
on
power,
we
are
using
that
the
instruction
called
data
cache
approaches
zero.
So
one
instruction
we
are
zeroing
the
whole
calculation,
so
you
you
can
imagine
you
help
a
128
kilobyte
of
local
heap.
A
You
only
need
the
one
K
instruction
to
do
the
whole
clearing,
so
this
death
occurring
is
done
by
the
GC
when,
when
you,
when
you
ask
for
a
new
conceived,
doula
created
for
you
and
then
you
have
the
toe
edge
in
and
when
your
bread
is
doing
new
object,
you
don't
need
to
do
zero
initialization
of
the
object,
because
the
alternative
was
clear
already
so
the
trade-off
here
is
you.
You
TC
will
do
the
path
occurring,
using
a
wider
interaction
to
the
trajectory
and
in
a
city
code.
A
A
Basically,
are
you
right
through
a
right
back,
alright
allocated
move
is
more
relay
to
write,
allocate
or
not,
if
you
are
doing
p
cash
right
allocated
and
when
you
do
zero
it,
because
the
bachik
rating
is
also
right.
Oh,
so
you
do
write,
allocate
what
that
means.
The
whole
charity
will
be
brought
into
D
cash
is
trash.
Is
a
Pikachu
multiple
times,
because
the
pkg
studied
or
kilobyte
and
clarity
typically
is
138th,
provide
you
do
that
clearing
your
EKG
was
trash
each
of
four
times.
A
So,
basically,
all
your
wrong
data
sitting
in
a
DK
is
always
that
has
some
performance
implications
for
your
later
run
and
fight
through
not
right
allocate
on
power,
because
the
level
two
is
much
bigger.
For
example,
on
p9
the
level
two
is
a
512
kilobytes.
You
do
a
128
kilo
by
the
query.
That's
fine
is
1/4
of
the
size
of
level
2.
You
still
have
3/4
of
the
level
2
kg
there
for
the
other
important
data
and
level
the
pkg
level.
1
is
never
touched
when
you
do
the
query.
A
So
there's
something
here,
X
and
Z,
because
it
says
write
allocate
is-
is
that
the
difference
here
also
SMT
level
consideration,
because
you
have
a
certain
level.
You
have
for
thread
doing
zeroing
when
you
are
really
thrashing,
because
each
one
is
1
or
22
kilobyte
you
assembly
for
total
to
be
512
kilobytes,
so
I
have
a
experience
on
KGB
2015.
You
really
need
to
tune
theoretic
maximum
size,
be
a
smaller
and
actually
improve
performance,
not
to
trash
the
whole
level
tool.
A
Okay,
go
into
the
next
right.
Next
side
is
atomic
update
and
locking
and
interpreter
communication
for
pachala
for
any
programming
language.
The
enterprise
communication
typically
is
going
through
your
atomic
operation
and
in
Java
is
also
through
the
volatile
variable
because
water,
how
variable
Java
volatile
variable,
has
the
has
the
sequential
consistent
later
I
will
talk
about
that
when
the
baby,
you
have
a
void,
unchecked
databases
and
humming
operations
have
the
because
they
are
going
to
be
exclusive
anyway
atomically,
so
they
can
do
that
interpreter
communication.
A
A
Their
behavior
is
you
do
update
you
do,
for
example,
atomic
imager
increment
by
one
when
you
do
increment
by
one
later
on
you,
you,
basically
you
you,
assuming
that
atomic
integer
data
can
be
handed
to
other
thread
to
do
another
adopting
update,
but
for
locking
is
different
for
locking
you,
although
you
are
using,
compares
what
to
grab
the
lock
there.
The
occupation
of
that
lock,
pretty
much
means
this
cache
line
should
not
be
passed
to
the
other
processor,
because
it's
elaborate
either.
A
Even
you
get
that
cache
line,
they
cannot
do
anything
about
it
because
you're
locked,
so
it's
different
from
the
the
peasant
set
of
fetch
an
ad
or
whatever
you
do,
atomic
updates
ugly
there.
You
can
toss
the
cache
line
to
other
processor
to
do
further
fetch
and
add
that
different
here,
so
how
to
differentiate
this,
to
behavior,
locking,
locking
youth,
compare
network
and
and
atomic
updated
using
comparative
as
well.
How
to
differentiate
this
into
two
behavior
I
didn't
see
actor
and
Zi
can
differentiate
it
on
power.
A
We
have
a
instruction
sheet
so
on
the
components
water
that
a
loaded
reserved
instruction
on
power.
Where
you
do
the
load
and
reserve
you
can,
you
can
provide
a
hint.
You
have
included
instructions
slightly
differently.
You
have
a
hint
here
tell
the
peseta.
These
instruction
is
intended
for
atomic
update
instead
of
locking
and
then
later
on,
the
processor.
We
are
managing
the
cache
line
differently.
A
A
C
C
A
The
job
object
that
the
lock
has
to
stay.
Warm
is
the
flat
lock
busy
is
only
the
lock
word
in
object,
indicating
a
lot
where
you
are
holding
the
lock
or
not
and
the
length
as
a
way
when
the
lock
is
contended.
There
are
multiple
thread
contending
for
the
same
object,
a
lock
and
it's
going
to
be
inflated.
A
A
A
If
you
have
contended
there
is
going
to
later
on
is
going
to
determinate
into
people,
pthread
mutex
and
the
conditional
variable
things
there.
So
I
even
didn't
talk
about
the
unsafe
pocket
and
pocket
style.
Login
here
is
a
Java
monitor,
lock.
Basically,
you
health
to
to
state
flat,
lock
and
in
created
a
lock
okay,
so
in
the
next
slide
and
memory
consistently.
So
what
it's
very
consistent
you'll
have
had
a
hardware
memory,
consistency
model
and
a
software
memory
condition
is
model
so
memory
consistency
model
beta,
is
a
contract
between,
for
example,
for
hardware
memory.
A
Consistency
model
is
a
contract
between
the
hardware
behavior
versus
your
program
and
for
software
memory.
Consistency
model
is
a
contract
between
the
programmer
and
a
program
which
is
governing
the
behavior.
What
is
behavior
and
here
I
is
all
because
of
cache.
It
was
as
no
cash
in
a
system
you're
pretty
much
in
you.
You
don't
have
a
memory,
consistency
issue
here
and
is
pretty
much
coming
down
to
sequential
consistency,
because
you
can
imagine
you
know
cache
everything
you
go
into
its
converge
on
a
memory
controller
to
do
the
memory
operation
on
a
memory.
A
Then
you
help
a
you:
have
a
single
funnel
through
to
the
memory.
Then
it's
going
to
be
as
long
as
your
typical
appearance
that
we've
ordered
a
memory
operation.
Then
everything
is
a
funnel
of
the
food
memory
controller.
At
the
memory
point,
then
you
don't
have
a
memory
consistency
issue
at
all,
because
it
is
by
definition
the
single
final
memory
controller,
the
a
a
sequential
consistency,
sequential
consistency
is
easier
to
to
understand
and
the
deals
to
the
behavior
now.
A
The
the
other,
if
you
relate
to
memory
consistency,
is
interest
read
or
during
interest,
elevated
within
a
single
thread,
what
the
father
behavior
DVD,
the
behavior,
eight
people
each
return
ordering.
This
is
a
some
assumption
for
sure
and
implicitly
true.
This
assumption
is
true
on
all
processes.
I
knew,
except
on
40
50
years
ago,
that
a
procedure
called
alpha
by
Dec.
A
They
they
have
some
behavior
interest
well
already
didn't
conform
to
program
order.
They
have
some
value
of
speculation
going
on
in
their
processor
but
or
because,
if
GUI
it
can
lead
to
very
strange
behavior.
When
your
interest
rate
already
didn't
conform
to
program
order
program
order
basically
means
your
instruction
layout
disorder.
A
Then
your
behavior
is
confirmed
to
your
as
you
see
what
a
you
where
you
see
in
your
instruction
sequence
that
behavior
that
it
was
there
already
is
your
that
prayer
observed,
but
memory
consistency
model
is
dictating
the
total
ordering
of
total
memory
access
in
your
system
and
that
can
get
into
different
model,
so
so
interest
rate
interest,
read
ordering
is
program
order
setting
assumption.
Otherwise
you
have
a
lot
of
paradoxical,
very
trendy,
behavior
behavior.
You
have
a
causality
problem
problem
there,
but
even
the
interest
led
is
program
order.
A
Interval
ordering
is
program
order,
but
they're
rather
ly
externalized
when
your
access
external
angrily
your
job,
your
memory,
access
that
well
memory
access
is
put
on
our
outside
obvious.
If
you
call
their
order,
doesn't
conform
to
the
program
order,
so
you
internally
will
observe
your
program
ordering,
but
externally
other
people
other
thread
observe
your
access
is
not
in
your
program
order.
That
is
the
problem
here,
and
so
so
there
are
different.
A
You
have
different
Hardware
commitment,
the
the
memory
consistent
model
and
different,
not
different,
is
a
single
model
on
the
software
side,
and
so
I'm
here
is
talking
about
a
hardware
consistent
model,
so
the
axiom
is
e,
RTS
or
total
store
ordering
and
a
P
is
weak
ordering.
So
now
what
the
people's
between
here
is.
I
have
an
example
here
so
x
and
y
shut
off
as
0,
both
verbal
0
and
the
3
thread
doing
things
the
third
one.
Basically,
we
stop
1
to
X,
star
1,
Y
and
thready
2
would
try
to
load.
A
Why
included
is
the
other
two?
Why
and
load
a
single
other
2s
and
fell
asleep
in
the
same
thing,
although
it's
going
to
put
two
different
register
now,
what
again
after
this
program
is
done
filling
what
you
are
evacuated,
observe
4804
total
story
is
pretty
much
means
we
the
thread
to
the
x
and
y
equal,
0
and
1
respectively,
that
it
is
impossible,
but
3.
The
0
and
1
is
also
impossible,
because
this
total
saw
orderly
memory
model
total
saw
already.
A
A
0
1
is
impossible
because
you
observe
y
as
1,
but
at
the
same
time
you
observed
XS
0,
that's
impossible
for
total,
stop
ordering
okay,
but
our
power
is
weak
or
during
recovery
means
this
store,
and
this
to
store
ethical,
1
and
y
equal
1
can
happen
any
order.
So
any
combination
is
possible
for
on
power.
A
For
that
sleep
read
the
you.
Can
you
can
observe
anything?
Okay,
that's
the
difference,
PSO
and
recovery,
and
next
slide
a
last
slide
here
so
now,
going
to
back
a
language
language
consistent
model
in
history
for
CNSE,
probably
even
doesn't
help
with
a
memory
model
only
in
the
purpose
of
2011
standard.
They
added
a
simple
past
memory
model
so
that
at
that
time
you
have
a
contract
between
the
language
and
program.
You
when
you
write
a
program
this
way,
you
are
guaranteed
to
see
this
behavior.
Otherwise,
in
part
that
you
write
something
in
Central
Park.
A
Good
luck,
you
multi
Prairie!
The
venue
is
not
currently
the
what
you
but,
of
course,
is
typically
is
okay
ja.
In
early
day,
they
already
define
a
Java
memory.
Model
is
Luigi
a
theory
as
a
Java
specification
requested.
Number
133
is
a
so
p3
just
a
program
and
in
your
execution,
you'll
have
the
we
contractor
between
your
program
and
when
is
actually
the
it
need
to
conform
a
certain
certain
way.
So
the
Java
language
to
to
have
a
concise
description
of
what
majority
model
means
is
basically
is
a
sequential
consistent
of
all
water
level.
A
Test,
plus
all
lock
regions
is
consistently
consistently
quinary
consistent
as
well.
These
two
things
put
together
and
plus
interest
rate
conforming
to
your
program
order.
These
three
things
together,
it's
governed
the
behavior
of
your
program,
and
now
you
have
the
Java
memory
model.
In
your
hand,
you
have
a
hardware
behavior
TSO
or
we
ordering
yacht
from
Java
Java
Runtime
point
of
view.
You
need
to
guarantee
your
Java
program
will
behave
like
G
defined
by
Java
memory
model.
A
So
so
your
your
novella,
because
I
wrote
here
it
no
matter
whether
underlying
hardware
is
strong
or
weak
or
during
your
PD
compiler.
Immediate
code
need
to
behave
like
defined
by
the
Java
memory
model.
That's
the
contract!
Okay!
So
in
our
cheat
in
our
teacher
behavior
is
you
have
you
still
have
you
need
actual
memory
barrier
for
JVM
safety?
What
I
mean
here
is
you
have
impressive
data
in
object,
for
example,
for
a
array
object,
you
have
the
in
object
header,
you
have
the
object
type.
A
You
have
the
rail
length
that
to
field
our
implicit
data
in
your
Java
object
and
for
safety
means.
If
we
didn't.
If
you
didn't
guarantee
the
order
of
these
implicit
data,
you
can
crash
in
a
JVM.
For
example,
if
you
already
you
initialize
the
your
Java
of
array
to
be
length
of
100,
but
the
other
thread
applicable.
The
Java
leg
array
object
the
length
to
be
1000
personal
order
right
then
you
pick
up.
1000
then
is
going
to
be
accepted,
something
wrong
in
the
crash.
Okay
that
happen.
A
That
happened
not
that
infrequently
on
power
actually
and
we
need
to.
We
need
to
insert
the
right
memory
barrier
at
the
new
object
case
in
that
instruction
sequence
after
the
instruction
to
guarantee
when,
before
you
can
publish
your
object
reference
to
other
thread,
otherwise
you
can
imagine
you
initialize
here
and
you
publish
it.
You
probably
say
with
you
the
so-called
publish
your
object
reference.
A
A
B
So
I
want
to
thank
Julian
for
giving
an
overview
on
all
three
architectures,
not
that
many
people
on
the
team
are
able
to
give
such
a
good
discussion
on
the
three
architectures
and
what
all
the
subtle
differences
and,
where
they're
similar
and
how
they
actually
matter
in
our
code
generators.
So
I
wanted
to
thank
you
and
thank
you
all
for
coming
to
this
talk
stay
tuned
for
another
talk
next
month.
So
thanks.