►
From YouTube: OMR Architecture Meeting 20200716
Description
Agenda:
* Add Dynamic Breadth First Scan Ordering to GC (#5377) [ @jonoommen ]
A
B
A
Right
so
my
name
is
my
name:
is
John
Newman
for
those
that
don't
know
me,
I've
been
IBM,
run
times,
GC
developer
for
the
last
three
years,
so
today,
I'm
just
here
to
talk
about
my
most
recent
most
recent
PR
for
the
Eclipse
Omar
and
Eclipse
opened
a
non
project,
so
this
is
called
scavenger
dynamic,
rep
for
a
scan
ordering.
So
just
a
quick,
quick
outline
of
what
I'll
talk
about
it
today.
So
what
is
what
is
a
scavenger
scan
ordering?
A
Which
many
of
this
will
be
familiar?
I'll
move
through
these
topics
somewhat
somewhat
quickly,
but
what
is
dynamic
breath
for
a
scan
scan
ordering
and
for
the
sake
of
time,
I'll
call
it
dbfs?
Oh!
So
what
is
the
goal
of
devfs?
Oh,
the
design
and
implementation,
some
results
and
then
look
at
the
code.
A
It
it
dictates
the
order
which
objects
in
an
object
graph
are
scanned
and
copied
during
the
GC.
So
currently
within
scavenger,
there
exist
two
possible
scavenger
scan
orderings.
We
have
a
hierarchical
scan
ordering,
which
is
the
default
scan
ordering
for
for
GenCon,
and
then
we
have
order,
which
is
the
default
scan
ordering,
and
then
we
have
breadth-first
scan
ordering.
A
So
what
is
dynamic
breath
for
scan
ordering?
So
it's
an
optimization
for
breadth-first
scan
ordering
packaged
as
a
new
scan
ordering,
so
scavenger
will
scan
objects,
breadth-first,
but
dynamic
or
DB.
Fso
enables
the
recursive
depth
copying
of
hop
fields
marked
by
the
JIT
immediately
after
the
armed
that
scavenger
is
currently
being
copied
that
contains
off
I'm
a
metal
but
you'll
visually.
A
See
this
and
it'll
it'll
be
a
bit
it'll,
be
more
clear,
so
currently
is
developed
for
Gen
Con,
although
the
first
main
expected
use
for
it
will
be
in
bounce
GC,
where
breadth-first
scan
ordering
is
still
currently
the
default
scan
ordering.
So
what
is
that?
Well
is
the
goal
of
dbfs,
oh,
so
this
is
to
improve
BFS
Oh
locality
issues,
so
just
a
brief
overview
of
some
basic
locality
principles.
Three
of
the
90/10
rule,
so
a
program
spends
90
percent
of
the
time
in
ten
percents
code.
A
Spatial
localities,
as
items
whose
addresses
are
near
one,
another
tend
to
be
referenced
close
together
in
time,
and
then
we
have
temporal
locality
which
states
a
recently
accessed
items
are
likely
to
be
accessed
in
the
near
future,
so
moving
on
and
building
on
locality
with
regards
to
hot
fields
and
hot
and
access
patterns.
So
a
hot
field
is
a
field
that
is
frequently
accessed
by
object.
Answers
so
take
take
a
string
object
when
a
fields
for
other
string
is
accessed.
A
A
A
So,
according
to
the
90%
90/10
rule,
if
90%
of
the
time
is
spent
in
10%
of
the
code,
there's
likely
some
very
hot
object
access
patterns
that
it
would
be
great
if
we
could
exploit
so
looking
at
locality.
It
would
be
great
if
we
could
have
frequently
access
objects
beside
each
other
in
memory
which
will
likely
reduce
cache,
misses.
A
So
take
a
look
at
this.
This
very
basic
object
graph.
So
let's
say
you
have
object
a
which
has
two
fields:
B
and
C.
So
10%
of
the
time
B
is
accessed
90%
of
the
time
C
is
accessed
and
if
you
look
at
object,
C
10%
of
the
time
field
F
is
accessed.
90%
of
the
time
field
exists
and
then,
lastly,
would
be
10%
of
the
time.
D
is
accessed,
eat
90%
of
the
time
he
is
accessed.
So
how
can
we?
How
can
we
optimize
this
object
graph
for
locality,
so,
ideally
a
memory?
A
A
Looking
at
the
route
set
when
a
is
call
be
at
the
end
of
the
copy
will
ask
if
a
is
any
hot
field
and
it'll
say
yes:
I
have
I,
have
C
so
then
dynamically.
What
we'll
do
is
we'll
recursively
depth
copy.
The
hot
fields
of
a
so
after
8
will
be
copied,
then
C
will
be
copied
and
then,
as
you
would
expect
during
this
depth,
recursive
depth
copying
then,
as
C
is
being
copied
at
the
end
of
C's.
Copied
it'll
be
a
stages
to
see
it.
A
Okay
and
then
we'll
rehearse
our
way
back
up
and
then
looking
at
the
root
set
after
now,
a
is
done
so
then
F
would
be
copied
and
then
we'll
move
to
skip
it's
a
while
after
a
is
scanned,
we'll
have
B
and
then,
while
B
is
being
copied
at
the
end,
that's
and
they
could
understand
where
I'm
going
after
B
is
copied
at
the
end
of
the
copy
at
last.
Hey
just
do
I
do
I
have
any
hot
fields.
A
As
a
memory
and
it
will
likely
result
in
fewer
cache
misses
so
on
a
high
level.
Looking
at
the
the
design
and
implementation,
there
is
leveraging
of
existing
information.
That's
done
so.
Applications
consist
of
compilation.
The
instances
and
the
JIT
compiler
is
a
tiered
compilation
of
pilot,
meaning
that
each
method
has
a
base,
compilation
level
and
then
the
JIT
can
decide
to
optimize
a
method
further
based
on
various
heuristics
that
relate
to
how
frequently
a
method
is
being
run.
So,
looking
at
the
Testarossa
compilation
levels,
we
have
cold
warm
hot,
very
hot
and
scorching.
A
A
So
this
hotness
value
is
computed
for
every
access
and
every
compilation
for
each
field
of
a
class,
and
so
once
again
this
is
also
done
adds
as
a
compilation
is
it's
promoted
or
optimized?
So
for
each
field
of
a
class,
we
can
aggregate
these
hotness
values
for
a
feel
for
the
field
accesses
across
all
method
compilations
and
get
an
approximate
value
for
the
hotness
of
each
yield
of
the
class.
A
A
So
if
I
look
at
and
three,
we
can
look
at
the
some
of
the
code
shortly,
but
if
we
look
at
results
and
there's
more
performance
benchmarks
than
being
run,
but
if
we
look
at
once
once
initially
right
now,
it's
respect
to
UV,
2005
and
spec
baby
2015.
A
So,
with
regards
to
getting
into
the
code,
I
wasn't
I
wasn't
entirely
sure
on
the
best
best
way
to
do
it.
But
I
have
I've
links
right
here
to
the
to
the
pull
request
for
the
eclipse
Omar,
and
it
comes
up
when
j9
projects
as
well
as
some
details,
were
the
more
important
aspects
of
the
can
be
found
just
and
then
I
also
have
a
touch
or
just
slide
representing
summarizing.
A
Some
of
the
key
data
structures
that
set
that
are
used
within
this
feature,
so
that
for
that,
for
the
presentation
that
would
be
all
and
but
Darryl,
is
there
any
way
or
any?
Do
you
know
how
you
would
like
to
move
forward
with
looking
at
some
of
the
code
or
perhaps
talking
about
some
of
their
design.
B
Well,
I
guess
we
can
just
us
here
and
see
if
there's
any
questions
about
what
you
what
you've
talked
about
and
if
you
want
you
can
certainly
dive
into
your.
It
would
take
people
through
a
more
structured
walkthrough
of
the
code.
I
know
myself.
I
have
a
question
about
something
you
talked
about
on
slide
11,
which
was
you
talked
about
aggregating.
B
A
That
that
average,
that
it's
clear
there's
a
few
different
reduction
algorithms
I've
been
playing
around
with,
but
yes,
an
average
would
be
one
of
them.
I've
also
done
so
then
an
average
of
summation
and
I
did
implement
also
like
a
minimum
and
a
maximum,
but
for
the
sake
of
the
initial,
the
initial
commit
I
left
those
I
love
those
in
just
in
a
private
branch.
But
yes,
that's
correct.
So
these
these
are.
These
are
averaged
among
clocks.
Is
the.
B
A
The
aggregation
is
regards
to
the
block
frequency
is
solely,
but
as
we
but
the
if
you
look
at
the
information,
that's
that's
stored
with
regards
the
field
after
we've
aggregated
this
block
frequency.
What
we
do
is
similar
to
what
was
implemented
before
in
the
old
hot
field.
Implementation
is
we'll
multiply
this
value
by
by
a
factor
so
for,
as
of
now,
it's
more
or
less
copy
directly
from
from
what
used
to
happen
before
for
hot
field,
so
for
compilation.
This
is
this
value.
This
aggregated
block
frequency
value
is
multiplied
by
one
for
a
hot
method.
A
This
aggregated
value
is
multiplied
by
ten
and
for
a
scorching
method.
This
aggregated
value
is
multiplied
by
a
hundred
there
is
there
is
that
has
been
looked
into
to
potentially
for
next
phase
of
this
to
look
into
try
to
getting
CPU
utilization
for
each
four
methods
that
will
allow,
in
a
far
greater
increase
that
alright,
a
greater
increased
accuracy
to
this
as
warm
as
methods
of
the
same
compilation,
level
will
differ
or
can
differ
somewhat
significantly,
and
especially
in
applications
as
day-trader
there
there
is
through.
You
will
be
minimal
benefits
as.
B
B
How
is
the
so
I
seem
to
recall
that
there
wasn't
that
you
created
a
pull
request
several
months
ago
for
this,
and
there
was
some
discussion
around
how
the
hotness
information
was
communicated
from
the
gifs
to
the
to
the
GC
I.
Don't
remember
how
it
with
what
the
resolution
of
that
was.
How
is
the
hotness
information
exchanged
with
the
GC.
A
I
would
for
all
the
VM
decisions.
Toby
Toby
was
my
go-to
for
for
expertise
and
then
for
all
JIT
related
questions,
Andrew
Andrew
correct.
What's
my
with
my
go
to
and
then
for
GC
Alex
Alex
me
teach
was
when
it
came
to
architecture
was,
was
the
ones
I
I
contacted
so
looking
at
this,
so
that
this
was
the
best
decisions
that
we
collectively
came
up
with
and
so
looking
at
a
j9
classloader.
So
this
is
from
speaking
with
toby.
We
came
up
with
each
class.
A
Loader
will
have
a
hot
field
pool,
and
so
this
will
have
a
pool
for
off
all
hot
fields
related
to
all
classes,
for
that
class
loader
and
then
will
have
a
global
pool
of
what
we've
called
we've
called
j9
hot
class
info.
So
we'll
have
a
global
pool
of
these
objects,
and
so
each
class
has
an
initially
null
hot
fields
info
and
then,
as
its
first
as
its
for
as
its
first
hot
seals
are
discovered,
will
initialize
this
pool
element.
A
What
we'll
do
is
we'll
go
through
we'll
go
through
the
j9
hot
fields,
info
pool,
and
then
we
know
we
only
store
the
head
of
the
list,
so
we'll
iterate
we'll
go
through
all
these
objects
which
and
then
so,
as
you
can
see,
this
hot
field
list
head
just
points
to
the
head
of
the
list
within
the
hot
field
pool
of
the
class
loader.
So
what
will
do
is?
Will
it
will
iterate
this
list
and
will
find
it
is
a.
A
Yes,
it's
correct,
sir
yeah,
when
scavenger
is
copying
it
last
for
the
first
hopfield
offset
and
the
second
Hartfield
offset,
and
then
based
off
that
will
recursively
depth
copy
the
the
hot
field
and
then
there's
a
special
value
of
you
data.
Are
you
eight
au,
eight
max
that
lets?
Scavenger
know
that
there
is
currently
doesn't
exist,
a
hot
field
there,
so
this
previous
and
then
so
I
guessed
a
and
then
we're
working.
Just
a
little
bit
back
to
your
initial
question
regarding
the
issues
of
the
issues
of
first
implementation,.
C
As
my
understanding
goes
looks
like
this,
whole
Hughes
is
like
a
static
to
every
class
instance,
meaning
if
there
is
a
new
instance
of
this
class.
All
the
odd
fields
are
re
Herenton
from
whatever
values
they
were
previously.
So
why
don't
we
the
difference
between
like
a
class
that
has
two
three
fields
and
another
class
that
has
eight
nine
ten
fields
so
how
these
behave?
Even
those
scenarios.
A
So
as
a
so
as
a
now
there's
a
there's,
a
hot
field
max
capped
off
at
that.
That
list
is
that
letting
this
length
is
capped
off
at
ten,
but
there's
also
a
certain
threshold
that,
during
the
heart
mark
during
the
hot
marketing
pastors
that
there's
certain
threshold
of
a
block,
frequency
that
has
to
be
met
in
order
for
that
field,
information
to
be
to
be
stored
and
that's
to
avoid
the
possibility
of
to
avoid
the
possibility
of
having
excessive
amounts
of
hot
fields.
So
for
each
new
object.
B
A
A
A
B
Okay,
so
I
don't
know
if
you
know
you,
you
said
that
you
were
maybe
prepared
to
take
people
through
a
code.
I
mean
I,
think
that
would
help
if
you
know,
for
those
that
are
going
to
be
doing
the
review.
I'm,
not
I'm,
not
unless,
unless
people
really
want
that
level
of
detail,
I
don't
know
if
we
need
to
go
that.
Go
that
deep.
But
if
somebody
does
want
John
to
do
that
by
all
means,
please
speak
up.
B
D
This
one
last
question
or
one
question
consistent:
are
you
finding
the
results
from
this
from
run
to
run
like
because
there's
a
certain
aspect
of
this
that
depends
on
compiles
that
happen
earlier
sort
of
influenced
compiles
that
happen
later
and
there's
some
non
determinism
in
the
order
in
which
compiles
happen
anyway.
So
right.
C
C
A
A
Quite
quite
consistent,
very
consistent
values
and
each
one
looking
back
at
looking
at
the
analysis
that
I
did-
and
this
is
I'm
just
trying
to
remember
out-
remember,
but
each
each
run
had
so.
If
you
look
at,
for
example,
for
example,
these
these
results,
each
one
hat,
would
have
had
roughly
a
within
within
one
two
one,
two
to
one
percent,
one
to
two
percent
of
all
the
values.
So
if
you
look
at
the
percent
improvement,
sometimes
they
varied
between
would
between,
let's
say
six
and
eight
or
seven
and
nine.
A
And
if
you
look
at
the
the
improvements
perspective
with
regards
to
max
the
obstacles,
they
are
those
they
tended
to
be
very
very
within
within
a
within
a
margin
within
a
thin
margin.
And
so,
along
with
this,
like
compile
time
like
Quercus
footprint,
footprint
throughput
compile
time
and
all
that
was
analyzed
and
all
those
are
have
have
shown
very
consistent
values.
D
A
A
May
perhaps
you
know
a
month
a
month
dated,
but
during
out
several
point
that
did
perform
an
analysis
on
on
all
the
fields,
all
their
hotness
values
from
run
to
run
and
the
overall,
the
overall
two
hottest
fields,
and
so
that's
the
I
think
from
run
to
run
there
were
some
there's,
some
variance
among
the
ordering
of
the
hotness
of
the
fields
when
it
came
down
to
the
to
the
two
hottest
fields.
Those
were
consistent
every
time
with
the
benchmarks
I
looked
at.
A
B
A
Yes,
I
did
I
did
found
I
found
some
and
I
found
some
that
had,
let's
say
I
some
that
had
three
two
four
three
two
hot
for
hot
fields,
but
then
some
with
so
there's
I
think
there's
room
I,
really
believe,
there's
room
for
for
plenty
more
optimizations
to
this
moving
forward.
But
yes,
there
were.
There
were
some
that
had
that
had
at
least
three
that
were
quite
quite
hot
feels
so.
B
I
guess
the
trade-off
would
be
the
added
footprint
cost
of
storing
that
frequency
information
for
extra
for
a
couple
of
extra
fields.
I
know
it
just
might
be
useful
from
a
or
interesting
from
a
just
to
see
what
kind
of
performance
you
can
get
perspective
if
you
just
increase
the
size
there
and
sort
of
see
what
how
it
affects
some
of
these
workloads
that
you're
running
you
capture
more
than
just
two
fields.
A
Yeah
yeah,
that
for
sure
would
be
a
would
be
a
possibility.
I
know
with
regards
to
I
I'm,
not
sure
D
the
how
how
wide
the
benefits
would
be
just
just
with
regards
to
looking
at
like
if
you,
if
you
look
at
let's
say
this,
this
object
graph
and
now
let's
say
that
both
like
both
B
and
C,
where
we're
hot
field.
A
So
that's
a
we're
now
in
this
scenario,
we're
keeping
two
feels
so
if
you're
caught,
if
we're
talking
to
in-depth
so
I,
have
a
depth
set
to
three
so
it'll
a
copy
of
max
depth
of
three.
But
if
within
that,
if
you're
copying
to
a
depth
of
three
four
hot
feels,
if
you
go
along
this
path,
I
believe
that,
even
if
you
extend
that
to
a
third
hot
field,.
A
Outside
of
the
cache
outside
of
the
cache
line
anyway,
so
I'm
not
sure
if
those
benefits
would
be
weird
being
inherited
from
extending
further.
That
was.
That
was
my
thought,
because,
when
I
cuz
I
had
thought
that
when
I
saw
some
results
that
had
three
there
or
a
third
hot-hot
field
is
that
does
that
make
this
yeah.
B
A
There's
an
adaptive
and
an
adaptive
sorting
that
that
takes
place
so
within
the
first,
so
I
I
have
it
now
I
experiment
with
some
different
values.
So
now
for
the
two,
so
the
first
200
scavenges
on
each
scavenge
ie,
a
quick
sorting
of
all
the
hot
fields
takes
place
and
that
will
allow
for
adaptability
of
hot
fields
and
then
moving
forward
for
the
rest
of
the
duration.
Every
so
every
I
know,
I
can't
quite
remember
the
value,
but
every
X
amount
of
GCS.
A
We
increment
that
adaptive
that
an
adaptive
sorting
so,
for
example,
on
GC
1000,
let's
say
you're,
probably
you're,
sorting
every
six,
every
six,
seven
GCSE
and
from
that
was
done
from
as
looking
at
a
looking
at
various
benchmark
runs
that
as
life
went
longer
and
longer
for
the
program.
There
wasn't
much
need
to
continue
to
sort
every
DC,
I.