►
From YouTube: 2020 11 19 Memory Team 2GB Sync
Description
0:20 Looked into Puma 5.0 Experimental features
- nakayoshi_fork https://gitlab.com/gitlab-org/memory-team/memory-team-2gb-week/-/issues/6#note_450184965
- Fork_worker
11:52 Outcome - single mode is better
- Trade-offs
15:00 Puma 5.0 discussions
18:00 Reforking discussions
- Is there a way to have the refork benefit without having to refork
21:42 Looked at 20RPS in 2GB
28:00 Deduplication
36:20 Heap usage of Puma process
- Heapy
39:15 pmap Puma single mode
41:15 Understand what calls malloc and how much memory allocated
- Looked at jemalloc and jeprof
47:00 Benchmark settings discussion
- The tested settings are super aggressive
The settings still need to be tuned
B
A
B
The
idea
is,
I
think,
camille
doing
similar
stuff
without
nakayashi
for
create.
The
idea
is
to
run
gc
couple
of
times
before
working
into
worker,
on
the
master
inst
on
the
master
process
and
then
doing
compact
and
in
the
end
like
it
was
giving
me
some
advancement
and
memory
like
I
was
looking
into
90
megabytes,
sometimes
less,
depending
on
like
experiment
I
was
running
to
and
it's
still
it's
still
worse
than
single
mod.
B
B
D
B
D
A
Good
yeah,
so
by
the
way
alexa,
I
I
did
the
same
thing
right,
you're
aware
of
this
right
just
so
because
we
said
we
want
to
make
sure
we're
not
overlapping
our
work.
A
A
I
looked
into
like
just
before
we
fork
into
puma
workers
how
what
what
does
it
buy
us
if
we
run
a
compaction,
includes
two
major.
B
B
You're
talking
about
this,
yes,
I
know
this,
I
I
wasn't
doing
like
any
deep
research
on
the
casual
fork.
I
just
enable
it
measure
it
and,
as
I
mentioned
like
you
and
camille,
you
are
doing
the
same,
but
let's
say
manually,
but
this
one
is
included
into
like
setting
of
puma.
B
Yeah
yeah,
no,
no,
not
special
version,
just
a
setting
you
enable
it
set
nakayashi
for
true
and
on
puma
five
zero.
It
starts
doing
the
thing,
so
it
gives
it
it
gives
an
advancement
like
expected,
but
worse
than
single,
I
also
looked
into
fork
worker.
The
idea
is
to
not
use
use
master
process
in
traditional
way,
but
work
from
a
first
booted
worker
and
then
maybe
refork
from
zero
worker
into
another
workers.
B
So
it
also
was
like
minor,
minor
difference
in
my
experiments.
I
was
running
it.
I
don't
warm
up
gpt.
It
was
like
the
difference
comparable
to
amount
of
uss
memory
of
the
process
like
less
than
100
megabytes.
So
I
didn't
really
look
into
hip
because
I
wasn't
able
to
get
to
it
yet
so
my
thoughts
after
it
that.
A
B
A
B
A
B
Kinda
but
they
call
it
a
bit
different.
You
see
there
is
this
picture,
so
you
basically
load
first
worker
and
working
zero
worker
and
then
working
from
it,
the
first
worker
and
then
these
are
most
heavy
workers.
So
you
are
not
using
master
in
traditional
way,
you
kind
of
starting
to
work
from
zero
worker,
and
then
you
keep
killing
all
other.
A
B
I
think
I
think,
basically
use
one
less
process.
The
selling
point
how
I.
B
Yeah
but
you're
not
you're,
not
loading,
every
single
must
right.
I
think
I
believe-
maybe
I
don't
understand
this,
but
I
believe
master
is
just
for
orchestration
here.
It
doesn't
like
doing
the
same
as
mastering.
Let's
say
typical
master
plus
worker
setup,
like
in
typical
master
plus
worker
setup
master,
is
loading
whole
application
right.
Then
we
fork
into
workers
like,
but
here
master
is
very
lightweight.
So
while
I
was
observing
it
was
40
megabyte
stocks.
So
all
the
preloading
happened
in
zero
worker,
so
master
wasn't
doing
anything.
It
was
just
orchestrating.
A
B
A
A
B
B
B
Yeah,
as
you
could
see
it,
basically
no
it
preloads
up
but
into
zero
worker
as
far
as
the
understand.
So
let
me
check
similar
yeah.
So
what
I
see
what
I
see
even
before
teaching
the
application
with
the
first
request,
unless
somebody
else
is
hitting
the
application.
His
first
request
on
the
background,
I
saw
that
zero
worker
is
become
quite
heavy.
A
So
when
you
incorporate
new
or
like
right,
so
if
you're
starting
a
new
worker,
you
always
fork
from
a
fresh
from
like
a
clean
slate
right.
You
always
for
a
new
worker
in
this
model
will
have
never
have
pages
loaded
that
you
might
have
to
load
to
actually
service
a
particular
request
right,
whereas
in
this
model
the
master.
A
A
Will
do,
but
it
will
be
much
less
useful
for
those
applications
that,
like
preload
the
whole
app
anyway
right,
because.
A
A
It
does
not
mean
that
everything
that
could
be
in
memory
up
front
is
actually
in
memory
up
front
right,
because
we
can
see
that
by
just
if
I
go
ahead
and
like
just
after
forking,
if
I,
if
the
first
worker
or
the
first
batch
of
workers
goes
live,
and
I
send
a
single
page
request
right
to
the
to
the
front
page,
it
will
start
shoveling
more
data
in
memory
right.
So.
D
D
A
E
E
D
Yeah,
basically,
it's
basically
lazy
initializing
and
I'm
just
curious.
Like
do
we
load
like
some
pieces
of
the
code
because
of
that
that
are
not
preloaded
in
like
the
regular
way
and
like
our
upper
load,
that
when
you
request
the
part
of
the
classes
that
are
not
yet
referenced
and
the
second
one
is
like
how
many
the
strong
memorize
happens
that
may
like
impact
our
memory
structure,
that
gonna
like
impact
copy
and
write
as
well.
D
B
No,
no,
no
they're,
two
different
completely
pictures,
yeah
and
they're
not
compatible.
As
far
as
I
know,
I
wasn't
able
to
run
post
nakayashi
fork
and
fork
work
and
see
the
results.
So
maybe
maybe
I
did
something
wrong,
but
I
didn't
I
wasn't
able
to
pair
them
up
yeah.
So,
but
like
my
my
outcomes
from
this,
that
probably
single
mod
is
the
way
I
mean
it's
much
better.
It's
like
more
than
100
megabytes
better
than
what
I
see
of
any
of
these
experiments
and
yeah.
B
I
don't
really
think
I
could
do
something
more
without
looking
at
the
hip
at
this
moment
like
because
these
are
just
not
like
fully
fully
proofed
experiments,
because
I
just
run
it.
I
just
observed
the
grass,
but.
A
D
D
Well,
yes,
performance
will
be
worse
right,
but
this
is
kind
of
like
the
trade-off
that
you
are
like
saying.
It's
like.
I
guess
like
it's
about
like
the
scaring
like
how
much
cpus
you
expect
on
the
two
gig
instance
and
how
beneficial
is
using
the
cluster
mode,
which
is
really
like
for
the
multi
cpu
installs.
D
D
B
I
don't
know
like
probably
like
to
be
honest
when
I
was
looking
at
the
gpg.
It
wasn't
like
really
worse,
more
test
failed,
but
it
wasn't
like.
Let's
say
it
was
like
two
seconds
instead
of
one
second
like
or
one
point,
five
five,
a
failed
test.
So
I
found
this
acceptable
on
a
single
network
and.
B
D
D
B
D
On
this
aspect,
because
like
this,
we
are
using
our
fork,
because
the
older
version
didn't
have
like
this,
my
fix
to
the
performance,
but
this
as
an
experimental
feature,
so
you
can
figure
that
so
nothing
really
is
holding
us
it's
more
like
the
time
to
ensure
that
it
works
properly.
B
So
I
I
think
that
I
will
probably
I
don't
know
pair
with
nikola
matias,
to
learn
more
about
hip
today,
because
I
think
that,
like
even
if
to
continue
with
puma,
I
should
at
this
point
I
should
start
looking
into
the
hip
difference.
I
mean.
D
So
I'm
kind
of
like
wondering
because,
like
I,
I'm
kind
of
set
on
the
single
mode
for
that
very
small
stuffs,
but
there
is
also
this
refork
thing
like.
I
just
wonder
that
maybe
asset
phrase,
this
is
something
like
for
us
to
figure
out,
maybe
not
this
week,
but
maybe
future
in
the
future
to
actually
understand
how
much
benefit
it
brings
us
when
running
in
this
cluster
style.
B
D
B
A
Options
like
with
how
we
roll
this
out
right,
so
we
could,
because
you
can
do
like
rolling
deployments
right
where
you,
maybe
you
deploy
a
new
worker
and
then
you
let
it
warm
up
in
the
background.
While
you
have
your
old
workers
still
servicing
traffic
and
then
you
might
switch
over
to
the
new.
A
Fork,
but
I
I
don't
know
how
like
in
environments
where
we
don't
control
how
this
should
run
in
production
or
where
we
have
no
idea.
Basically,
what
kind
of
traffic
we
will
service
this
might
be
really
hard
to
tune.
I
think.
D
I'm
kind
of
like
curious
like
how
this
reforking,
how
stable
is
that
and
like
first
like
how
much
benefit
it
brings
and
like
what
like
side
side
behaviors?
It
has
on
running
on
the
production
because,
like
I
think
that
I
get
super
clever,
but
it's
also
it's
like
kind
of
different
and
more
risky
compared
to
how
we
run
before
so.
B
D
Yes
and
like
nakayashi
folk
is
pretty
much
safe,
really
yeah.
I
guess
because,
like
it's
pretty
straightforward,
but
I'm
just
kind
of
wondering
if
there
is
like
some
way
for
us
to
have
the
reform
benefits
without
doing
quick
fork
so
like
in
the
current
model
yeah-
and
this
is
like
what
I'm
like-
also
interested
about
like.
D
This
is
what
I'm
kind
of
like
trying
to
like
to
sit
through
like
this,
what
is
being
loaded
afterwards
working
initialize
afterwards?
What
happens
between
this
time?
That,
like
you,
fork
and
you
process
a
single
request
and
like
there
is
this
spike
in
the
memory
consumption
like
what
operations
do
happen
during
that
time?
Can
we
somehow
make
it
happen
before
how
much
it
would
bring
a
benefit?
Because
if
we
can
somehow
like
make
this
work
without
the
reforking,
it
would
actually
be
a
probably
much
safer
way
to
implement,
and
maybe
we
could
save
like
another.
B
D
So
I'm
kind
of
really
curious,
like
what
happens
and
actually
like
this
can
be
done
like
with
the
execution
profile,
probably
what
happens
between
like
our
fork
and
like
processing
the
request,
pretty
much
like
any
request,
and
actually
I'm
kind
of
now,
thinking
that
we
could
really
like
do
like
execution
profile
out
of
that
what
is
being
executed,
I
mean
I
mean
the
frame
graph,
maybe
but
the
second
one,
like
execution
profile,
what
methods
are
being
hit
during
that
time,
like
of
between
like
fork
when
you
fork
into
a
new
process
and
like
then,
you
execute
like
the
first.
B
D
Sure
so
I
I
guess,
like
my
all
items
that
I
look
out,
I
look
at
the
that
20
rps.
It's
like
pretty
nasty
for
running
in
the
2g,
because
one
of
the
reasons
that
I
I
kind
of
like
got
to
understanding
like
why.
We,
I
see
like
pretty
pretty
nice,
like
the
puma
and
sideki
memory
usage
over
time,
but
like
a
very
steep
increase
in
the
swapping
at
some
point,
and
I
noticed
that
it's
due
to
like
to
get
processes
running
from
the
guitar.
D
It's
basically
spin
ups,
like
20
30
git
processes
because
of
the
unlimited
concurrency,
which
kind
of
leads
me
to
the
to
the
conclusion
that
in
this
small
instance,
we
should
limit,
like
the
git
concurrency
as
well,
to
ensure
that
we
control
like
how
much
is
being
spin
up
and
why
it
is
bad,
because
these
git
processes
are
in
the
gpt.
They
are
for,
like
the
big
repo
and
they
consume,
like
a
ton
of
memories.
D
Memories
being
started
like
because,
like
the
git
repo
is
a
mmap,
but
it's
still
pretty
like
evicting
to
the
application,
but.
A
D
People-
okay,
am
I
getting
data
from
the
repo
like
I'm
not
talking
about
the
rugged
rugged
is
something
to
kill
at
the
soonest
possible
moment.
I'm
talking
about
the
guitar
guitar.
You
can
open
repo
directly
in
the
git
pro
in
the
go
process
using
clip
gt2
and
they
implement
some
of
the
functions.
They
have
legacy
way
of
accessing
git
repo,
which
is
from
through
italy,
ruby,
a
site
process
that
is
running
that
they
communicate
with
this
italy.
A
It
sounds
like
the
image
scale
problem
all
over
again.
Remember
because
we
had
to
put
kind
of
a
rate
limit
in
place
for
the
image
scaler
as
well,
because
we
just
fork
on
request.
D
A
D
A
E
Of
the
resources
that
is
available
for
you
and
how
much
memory
does
each
get
process
consumed
just
like
roughly?
Do
you
remember?
It
really
depends
on
the
on
the
repo
that.
D
D
A
Of
the
guitar
yeah,
thanks
for
sharing
that
I
had
no
idea
it
worked
that
way.
I
I
was
kind
of
assuming
like
that.
Oh
a
nice
way
of
getting
you
know,
everything
runs
single
croc
and
apparently
not
it's
good
to
know.
D
However,
I
still
was
unable
like
to
see
the
three
pages
like
being
in
the
in
the
guitar
in
the
gitlab
process,
so
I
may
continue
looking
at
that.
I
noticed
that
on
my
like
linings
when
I
was
running
side,
but
I
was
not
noticing
that
on
the
on
the
gitlab
process,
so
I'm
not
sure
yet.
I
need
to
investigate
exactly
like
if
there
are,
if
it's
due
to
the
pages
that
maybe
are
fragmented,
fragmented
or
maybe
it's
due
to
some
other
reasons.
I
don't
know
that
about
that.
D
Yet
I
also
like
which
is
kind
of
like
I
think
it's
very
close
to
like
what
matthias
did.
D
I
kind
of
like
look
at
the
each
object
of
the
string
and
like
look
at
the
frozen
ones,
unfrozen
and
one
that
can
be
deduplicated,
and
I
noticed
that
like
we
can
really
like
to
duplicate
about
25
megabytes
of
the
strings,
basically
in
in
the
ruby
processes
of
github,
which
seems
like
pretty
big
number.
I
just
didn't
yet
found
like
the
easy
way
to
duplicate
everything
like
I
can
iterate
some
objects
and
like
that
appliques
in
these
objects,
but
some
objects
are
frozen,
so
it
makes
it
impossible
to
duplicate
them.
D
It's
not
really
straightforward,
so
I
didn't
found
a
way
like
to,
let's
say,
update
all
the
memory,
references
of
the
all
underlying
structures
to
perform
this
kind
of
the
application
in
the
in
the
native,
I'm
kind
of
like
thinking
that
in
general,
like
the
gc,
the
application
of
the
strings
could
be
really
like
a
nice
addition
to
the
ruby
because,
like
it's
actually
free
memory
to
reclaim
without
any
performance
benefit
without
any
impact
on
the
application
running
at
all
and
like
it
actually
is
kind
of
line
irani
with
the
big
application,
the
bigger
the
application
is
and
like
mathias.
E
A
It
kind
of
depends
on
what
you
look
at
that's
the
real
main
thing
that
kept
confusing
us.
So
if
I,
if
you
sum
up
so
I
looked
at
two
things
like
that
was
two
days
ago
or
so,
which
was
summing
up,
object,
space
each
object
and
that
came
down
to,
I
think
190
megabyte
for
me.
And
then
I
look
at
the
gc
stats
as
well.
A
And
if
you,
if
you
they
can
sum
up
the
number
of
allocated
pages
which
might
not
be
full
right
so
so
that
doesn't
actually
so
that's
kind
of
an
upper
bound.
So
it's
an
upper
bound
for
the
current
eu,
but
it's
actually
representative
of
or
should
be
representative
of
rss.
So
if
you,
if
you
multiply
that
with
the
page
size
which
is
16k,
I
also
got
around
that
number
like
150
to
190
megabytes,
which.
A
Well
below
the
actual
memory
used,
but
if
you
look
at
reports
from,
I
think
what
was
it
was
it
derailed?
I
think
nikola
you
spent
some
time
with
derail
like
was
reporting,
900
megabyte
used,
or
so
so
it
must
be
looking
at.
I
don't
know
yet
what
it
does,
but
it
must
be
looking
at
other
stuff
as
well
so
and
hippie.
I
think
as
well
was
it
was
considering.
D
Worker,
so
then,
I
also
like
look
at
this
instrumentation.
It
was
like,
like
commenting
like
one
line
of
the
code
basically-
and
it
kind
of
gave
me
like
the
some
idea
that
just
removing
that
alone
should
give
us
probably
about
half
a
mac
of
like
reduced
memory
usage
like
basically
for
free.
I
also
look
at
the
prometus,
client
and,
and
I
kind
of
remembered,
exactly
how
it
is
behaving
and
how
it's
implemented,
and
basically
prometheus
client
can
also
quite
significantly
increase
memory
pressure
over
application
run,
and
I'm
just
I'm
sure.
D
Maybe
this
is
also
one
of
the
reasons
why
this
usage
grows
over
time,
and
why,
like
it,
increases
the
initial
memory
usage
because,
like
different
types
of
the
metrics
like
histogram
counters,
gorgeous
with
different
aggregation
types,
each
of
them
need
like
an
individual
and
a
map
file,
basically,
which
is
owned
by
the
process
and
the
current
mmap
minimal
and
map
size
is
like
for
max.
D
I
I
didn't
get
exact
number
like
after
disabling
prometheus,
but
I
think
I
saw
like
something
around
60
or
80
megabytes
less
when
disabling
prometheus
metrics.
So
I
guess
this
is
like
another
venue
for
us
like
to
really
investigate
our
prometheus
integration,
because
I
think
it
can
be
pretty
big.
Think
of
the
memory
usage
in
the
processes.
A
There
was
an
outage
even
right,
a
couple
months
ago,
where
we
were,
we
were
running
out
of
memory
in
production
and
they
tracked
like
there
was
a
super
interesting.
I
think
I
floated
in
the
channel.
Back
then,
like
stan,
did
a
super
interesting
like
debugging
session,
where
they
looked
at
a
bra
like
a
core
dump,
basically
of
what
was
going
on
in
one
of
these
processes,
and
it
turned
out.
There
were
like
massive
strings
of
prometheus
metrics.
A
We
were
trying
to
push
out
through
the
exporters
which
consumed
a
ton
of
of
memory,
and
they
they
made
a
change
like
they
were.
I
forgot
the
exact
reason
why
these
strings
were
so
large,
but
apparently
there
was
like
a
bunch
of
information
in
that
we
didn't
actually
need
to
export,
so
they
fixed
it
that
way,
but
maybe
there's
still,
you
know
more
opportunity
to
to
look
at.
So
maybe
that's
actually
maybe
that's
exactly
the
same
reason
you're,
seeing
that
I'm.
D
So
I
guess
this
is
something
really
like
for
us
to
look
at
and
figure
out
like
how
how
it
can
be
done
better,
more
efficient
or
or
anything
like,
because,
like
I,
I
think,
like
we
are
kind
of
underlining
of
the
scaling
of
the
coming
prometheus
matrix
like
we
can
like
even
scraping
of
these
prometheus
matrix
takes.
I
don't
know
on
the
github.com
something
around
five
seconds.
Basically,
so
it's
pretty
it's
pretty
extensive.
It's
pretty.
D
Memory,
we
maybe
somehow
accumulate
that
matrix
somewhere
else,
I'm
not
sure
like
how
to
approach
that,
but
I
think
that
this
can
be
really
a
big
thing
of
the
memorial
usage
that
increases
over
time
and
that,
like
we
basically
store
like
multiple
matches
per
single
controller
as
well.
So
I
I
think
this
could
be
also
one
of
the
reasons
why
we
see,
like
pretty,
let's
say
reasonable
usage
of
the
memory
on
the
ruby
process.
But
there
is
a
lot
of
other
memory
leaking
outside.
I
mean
nothing.
D
It's
kind
of
like
being
mallocked
or
like
done
outside
of
the
of
the
ruby
and
prometheus
client
today
is
written
natively
in
the
sea.
So
it's
a
mapping,
it's
pretty
efficient,
but
like
it's
not
you
cannot
recycle
that
memory.
That's
the
problem
as
well,
and
it
actually
has
pretty
severe
consequences
for
for,
like
the
long-running
processes
where,
like
it,
increases
memory
like
kind
of
like
into
infinite.
C
A
Yeah,
I'm
cool
thanks,
yeah,
let's
I
guess
I
can
do
it
together
with
nikola,
because
we
work
together
most
of
the
day
yeah.
So
what
we
we
wanted
to
drill
more
into
the
heap
usage
of
the
puma
process
in
general
like
what's
going
on,
and
so
what
we
spent
most
of
the
time
on
was
pulling
looking
at
a
heap
dump
generated
with
object.
A
Were
some
discrepancies
because
it
also
depends
it's
a
bit
tricky
because
you
need
to
trace
allocations
first
and
depending
on
when
you
start
tracing
allocations,
you
might
not
actually
catch.
You
know
all
the
requires
or
whatever
you
basically
need
to
do
it
like
before
your
app
does
anything.
So
so
it's
not
it
wasn't
super
easy
to
yeah
gotta
get
a
full
account,
but
anyway,
so
we
used
so
by
default.
This
is
just
a
json
dump
like
a
json
blob.
A
Basically
it's
fairly
readable,
but
it's
like
not
aggregated,
so
there's
a
tool
called
hippie
that
we
used
to
run
over
that
data,
and
it
gives
you
like
a
interesting
breakdown
of
like
yeah
like
where
memory
is
allocated.
It
can
trace
it
back
to
files
as
well.
So
that's
all
updated
in
the
issue,
and
we
then
so
I
I
so
it
yeah
it
took
a
while
to
get
there
and
to
actually
get
this.
A
In
the
issue,
there's
like
a
dozen
different
sections
that
we
can,
we
can
try
to
wrap
our
heads
around
and
we
then
proceeded
to.
A
So
one
thing
we
observed
was
that
there
is
a
big
chunk
of
memory
that
is
reported
as
just
anonymous
pages,
so
it's
just
like
in
ram
pages
that
are
not
backed
by
files
right
so
and
it
we
don't
really
know
what's
in
there,
and
one
thing
we
wanted
to
find
out
was
do
like
to
what
extend
to
like
things
like
shared
libraries
and
like
yeah
like
external
code.
That
is
not
ruby,
contribute
to
this.
A
So
one
thing
we
did
was
we
looked
at
the
memory
map
map,
mapping
data
for
running
puma
worker,
and
that
is
summarized
in
a
spreadsheet
that
actually,
maybe
I
can
quickly
show
that
just
super
quickly.
I.
E
Mean
that's
not,
unfortunately,
that
is
like.
A
Not
a
big
takeaway,
but
maybe
it's
interesting
to
see.
A
Yeah,
the
a
very
tiny
amount
only
so
so
this
is
the
raw
data
from
from
by
the
way
we
ended
up
running,
we
ended
up
changing
the
so
this
came
from
the
vmware
omnibus,
but
we
changed
it
to
single
mode,
because
it
was
quite
difficult
to
reason
about,
like
page
sharing
all
that,
if
you
have
three
different
processes
to
look
at
so
we
wanted
to
see
first
yeah
just
in
terms
of
how
the
memory,
what
the
memory
layout
is,
we
wanted
to
look
at
a
single
process,
so
this
is
not
from
a
single
node,
a
single
worker
puma,
and
you
can
see
this-
is
I
just
created
a
pivot
table
based
by,
or
maybe
I
should
say
first
so
this
mapping
will
tell
you
so
blank
basically
means
it's
not
mapped
by
a
it's
not
backed
by
a
file.
A
So
it's
not
it's
just
yeah
directly
map
memory.
That
is
just
in
in
memory
for
that
in
ram
for
that
process,
and
this
is
the
aggregation
by
the
mapping-
it's
not
always
a
file,
so
so
this
blank
here
is
basically
the
sum
of
all
the
memory
that
is
not
file
backed.
They
get
all
ended
up
kind
of
up
here,
that's
just
the
total,
but
yeah.
You
can
see
that
the
vast
majority
like
is
not
backed
by
by
a
file,
but
I
think.
A
Exercise
to
see
what
kind
of
contributes
to
memory,
but
all
of
this
is
like
quite
tiny,
so
it
probably
doesn't
make
sense
to
to
look
at
external
libraries
but
yeah.
You.
D
A
Have
a
look
at
it,
it
might
be
interesting
to
see
just
like
what
there's.
A
Stuff
actually
that
when
we
map
into
memory
that
is
not
specifically
ruby
code,
so
I
thought
it
was
interesting
to
look
at
what
else
yeah
I
mean
that's
pretty
oh
yeah.
We
also,
then,
were
wondering
okay.
If,
if
there's
some
discrepancies
between
like
what
object
space
tells
us
is
being
allocated
on
the
heap,
maybe
there's
a
ton
of
like
melodics
going
on
that
are
not
reflected
in
these
reports.
So
we
wanted
to
see
maybe
there's
a
way
to
understand
what
calls
malloc
or
from
where
do.
A
We
call
malloc
and
how
much
memory
is
allocated
that
way.
A
We
have
no
idea
if
it's
going
to
lead
anywhere,
but
so
we
looked
at
je
mellock
again,
which
is
the
allocator
that
we
use
and
it
has
a
profiling
companion
tool
called
je
prof,
so
you
can
actually
and
we
compile
j
email
up
with
profiling,
support
enabled,
which
is
good.
That's
just
configured
like
that
in
omnibus
already,
so
we
created
a
profile
dump.
You
can
have
a
look
at
that
in
the
issue.
It's
just
like
totally
illegible
by
default
because
it
does
not.
A
It
does
not
like
the
the
symbols
like
where
this
is
being
called
from.
It's
just,
I
think,
an
address
in
memory,
so
it's
unusable.
But
apparently,
if
we
didn't
get
around
to
do
that
yesterday,
there's
a.
B
A
Unwind
these
and
map
them
back
to
the
actual
symbol
that
was
used.
Apparently
that's
preserved
some
or
help
that's
preserved
somewhere.
So
I'm
hoping
that
if
we
try
that
again
with
this
that
there's
like
a
igor
fontes
at
the
end
of
the
day,
there's
like
a
parameter,
you
can
run
it
with
dasher
symbols
which
should
reverse
the
resolve.
These
address
the
raw
addresses
to
a
simple
table
so
that
we
can
actually
see
what
the
method
I
guess
or
the
function
was
that
we
were
calling
that
so
maybe.
A
Yeah,
so
the
problem
is
these
heat
dumps
are
massive
right,
it's
just
like
it's
a
sampling
profiler
and
you
can
tell
it
the
interval
it
will
like
sample.
You
know
every
zone,
so
many
like
kilobytes
or
megabytes
or
whatever
you
tell
it,
but
it's
quite
large.
You
know
it
produces
large
files
and
a
lot
I
think
the
first
heat
dump
we
took
was
like,
like
20
files
or
something
yeah,
so
that
that
was
that
was
yesterday.
So
well,
okay,
we
can
talk
about
two
today
later.
A
A
Was
is
I
I
still
like
would
like
to
understand
like
where
all
these
big
differences
come
from.
I
think
it
would
help
us
maybe
to
understand
yeah
in
what
sense,
hippie
and
derail
see
seem
to
give
us
a
more
complete
account
of
the
memory
spent.
So
so
so
what
is
that
like?
What
is
that
difference
between
this
and
what
we
were
seeing
when
we
were
looking
at
gc
stats
and
object
space
results
generally
separately?
I
think
that
that
would
work.
You
said
you
tried
this
already.
C
A
A
A
Difficult
to
work
with
the
result
of
this,
but
I
mean
who
knows
yeah
then
I
mentioned
already.
I
think
so.
We
have
a
lot
of
data
collected,
but
I
don't
know
about
unicolor,
but
I
haven't
spent
a
lot
of
time
yet
on
actually
looking
at
all
this
stuff,
because
there's
a
lot
now,
we
have
all
these
derailed
breakdowns
and
I
think
there's
like
the
hippie
breakdown
somewhere
as
well,
which
could
be
interesting.
A
So
I
think
it
would
be
good
to
start
maybe
summarizing
somewhere,
because
we
have
a
lot
of
these
like
raw
data
dumps
in
this
issue,
and
it
it
takes
more
and
more
time
to
scan
through
all
this
stuff
and
it's
a
bit
overwhelming.
A
Try
to
draw
conclusions
and
highlight
the
interesting
bits,
maybe
in
a
separate
place
even
like
that
I
don't
know
so
I
think
that's
some
something
we
should
spend
some
time
on
yeah,
that's
all
I
can
think
of
for
now.
C
Yeah,
I
agree
with
you,
like
I
don't
know
with
the
gmat
dumps.
I
will
try
just
to
fix
the
library
and
see
why
it's
stuck.
So
I
will
really
time
box
it
to
see
if
we
get
some
readable
data,
but
I
will
really
time
box
it.
But
in
that
issue,
as
you
said,
we
collected
like
a
lot
of
data
from
different
tools
like
so
we
should
start
like
trying
to
get
some
sense
from
those
data
and
see.
B
B
D
D
D
Maybe,
like
I
don't
know,
disabling
prometheus
prometheu
matrix
instrumentation
is
also
like
kind
of
predictable,
like
we
lose
some
observability
but,
like
I
mean
like
it
kind
of
reduces
memory
research
to
kind
of,
like
I
mean
like
if
you
disable
that
materials
and
nikola
like
you,
should
see
slightly
different
pattern,
but
then
you
kind
of
know
that
this
is
due
to
the
prometheus,
basically
disabled.
So
this
this
can
like
remove
remove
some
noise
from
your
testing
as
well.
B
I
also
found
an
article
on
the
duplication,
so
I'm
curious
by
some
saffron,
but
he
mentioned
some
limitation
like,
for
example,
if
we
get
like
a
method
on
a
string,
it
creates
a
bar
in
ruby
and
it
won't
allow
us
to
duplicate.
Do
we
have
the
same
limitation
like
in
what
you
camel
mentioned,
and
also
he
mentioned
that,
for
example,
painted
strings
which
I
didn't
like
have
experience
with,
so
he
mentioned
that
there
are
a
lot
of
duplicated
strings
which
are
painted.
D
Yeah
so
actually
like.
If
you
look
at
the
issue
that
I
I
created
and
I
posted
some
results,
I
there
are
actually
like
plenty
of
less
strings.
That
would
be
the
dark
and
like
like
the
strings
that
have
eye
bar
assigned.
There
is
like
just
a
few
of
them,
there's
a
bunch
of
unfrozen
strings
and
we
could
probably
freeze
these
strings
like
by
by
looking
at
the
libraries
that
don't
freeze
them
and
like
contributing
which
could
result
in
another,
probably
around
10
megabytes
of
the
reduction
of
the
number
of
strings.
D
So
actually,
I
think
like
if
we
would
approach
that,
like
from
complete
point
of
view
like
we
could
be
looking
at
30
megabytes,
35
megabytes
in
total
per
process.
D
Yeah,
so
frozen
objects
are
the
objects
that
are
not
frozen.
I
mean
frozen
no.
D
D
This
is
the
reason
I'm
not
100,
confident
of
that,
but
I'm
assuming
that
a
bunch
of
these
strings
can
be
frozen.
Maybe
half
of
them-
I
don't
know
how
much
of
them
but
like
there
is
like
22
megabytes
that
can
be
removed
of
the
existing
frozen
strings,
basically
like
just
doing
an
extension
to
ruby
gc.
Basically,
nothing.
D
Else,
okay,
it
seems
like
I
don't
know,
probably
like.
If
you
contribute
to
the
ruby
ruby
this,
it
will
be
a
piece
of
code,
probably
something
between
100
to
500
lines
of
the
c
code,
doing
some
magic
and
that
would
that
application
would
reduce
about
50
megabytes
without
any
application
changes.
Basically,
just
ruby
run
time
change.
D
I
mean,
as
for
me,
like
I
kind
of
like
continue
like
with
these
investigations,
like
I
I'm
not
sure
like
where
it's
gonna
like
lead
me,
but
I'm
just
still
curious
about.
Maybe
I
finally
look
at
this
loading
of
the
of
the
application,
and
I
understand
what
is
happening.