►
From YouTube: GitLab 15.2 Kickoff - Enablement:Memory
Description
Kickoff for the Memory Group for the GitLab 15.2 release
Planning issue: https://gitlab.com/gitlab-org/memory-team/team-tasks/-/issues/117
Memory Group Past Kickoff Videos: https://youtube.com/playlist?list=PL05JrBw4t0Kq1HDOIfQ8ov6lfyJkWK2Yr
Presentation by: Yannis Roussos, Sr. Product Manager, Memory and Database Groups
A
A
Use
puma
is
a
web
server
that
we
are
using
in
vietnam,
so
we
have
observed
that
the
memory
of
the
puma
servers,
the
puma
pods,
keeps
on
growing
when
they
are
not
restarted,
and
this
is
more
evident
during
weekends,
when
we
don't
do
deployments
because
during
deployments
of
course,
the
ports
are
restarted
and
the
memory
is
cleared,
but
during
weekends
you
can
see.
This
is
the
blue
line.
The
memory
keep
on
increasing.
This
is,
in
contrast
to
the
yellow
line.
A
So
this
is
a
clear
indication
that
we
have
a
potential
runaway
memory
issue
and
in
june
50.1
we
investigated
this
problem
and
our
investigation
has
generated
multiple
findings
and
has
led
us
to
three
separate
paths
that
we
want
to
keep
on
working
during
50.2.
So
the
first
one
is,
of
course,
the
reason
why
we
started
this
initiative.
We
want
to
find
the
origin
of
the
growth
of
the
growing
memory
use
of
puma
when
the
boats
are
not
restarted
and
fix
it.
A
The
second
path
is,
we
want
to
add
ways
to
gather
more
data
for
production
servers,
increasing
visibility
to
those
problems
and
allow
us
to
diagnose
these
and
similar
issues,
and
the
main
thing
is
to
other
ruby
heap
fragmentation
metrics.
As
I
already
said,
we
have
found
that
the
increase
in
puma
memory
is
primarily
due
to
ruby
heap
fragmentation.
A
The
third
part,
the
third
part
that
we
want
to
work
on,
is
to
decide
how
to
deal
with
resource
allocation
of
various
environments.
So
in
gitlab
we
have
what
we
call
the
puma
of
the
killer.
This
is
a
piece
of
rubicon
code
that
runs
in
the
background
as
a
thread
which
reaps
what
the
processes,
if
they
run
over
a
given
memory
batches.
A
So
while
we
were
running
this
investigation,
first
of
all,
we
realized
that
this
was
not
running
correctly
in
gitlab.com,
due
to
a
configuration,
so
we
fixed
that.
But
while
doing
so,
we
started
a
larger
discussion,
does
it
even
make
sense
to
run
these
type
of
killers
in
a
resource-controlled
environment
like
bernettis,
so
in
kubernetes
we
define
container
and
other
resource
batches
anyway
through
kubernetes,
so
should
we
turn
it
off
and
allow
kubernetes
do
its
job,
and
this
is
true
for
the
puma
worker
killer
and
we
have
a
similar
issue.
We
discussed
about
the
site
worker.
A
Finally,
we
want
to
make
sure
and
make
a
decision
on
how
to
set
resource
limits
on
all
other
environments,
so,
for
example,
on
omnibus.
So,
for
example,
we
have
a
max
memory
that
each
puma
worker
can
use
and
those
limits
most
of
the
times.
We
we
define
them,
those
are
hardcoded,
so
they
have
a
default
and
we
set
it
and
that's
it
and
we
set
those
limits
using
our
reference,
architectures
and
they're
trying
to
take
to
cover
most
cases.
But
not
all
environments
are
the
same,
so
sometimes
those
limits
can
be
too
low.
A
That
each
gitlab
instance
runs
in.
The
second
priority
is
to
support
the
the
effort
for
fixed
compliance
in
gitlab.
This
is
an
effort
initiative
that
we're
running
throughout
the
whole
gitlab
for
a
few
months
since
now,
and
the
core
request
there
is
to
that
all
communications
should
be
secure
and
during
15.0
and
15.1,
both
the
memory
group
and
other
groups
have
addressed
a
lot
of
of
those
cases.
The
last
thing
that
remains
for
us
is
to
add
tls
security
for
the
dedicated
metric
servers.
A
So
those
are
the
metrics
and
points
that
are
scrapped
by
prometheus
to
send
all
the
metrics.
So
we
need
to
add
support
for
tls.
There
are
two
types
of
metrics
exporters,
the
the
ones
that
are
inside
the
rails,
application
the
git
libraries
application,
those
are
inside
puma
and
we
have
covered
that
by
enabling
support
for
tls
improvement
general,
and
then
there
are
the
dedicated
server
and
points
for
puma
inside
kit,
where
this
is
the
the
last
main
thing
that
we're
working
on
which,
where
we
also
want
to
to
enable
tls
security.
A
A
So
if
I
search
for
memory
here,
I
can
search
over
epics
and
code
and
issues
and
merge
requests.
I
can
check,
for
example,
reference
to
memory
on
all
all
over
the
place
inside
a
gitlab
or
the
the
github
organization,
or
I
can
go
to
merger
os
and
whatever
so.
The
problem
here
is
that
at
the
moment
we
gather
those
metrics
in
aggregate.
So
all
those
types
of
searches
are
accounted
as
one
type
of
metric.
A
We
want
to
increase
our
visibility
and
one
we
want
to
differentiate
between
those
types
of
searches
and
have
a
different
matrix
and
gather
different
metrics
inside
the
application,
expose
them
to
different
promiscuous,
metrics
and
finally
build
slices
and
slows
for
those
metrics.
The
idea
there
is
that
we're
going
to
differentiate
between
basic
and
advanced
search,
because
it
was
completely
different
if
advanced
search
is
a
a
premium
feature
if
it
is
enabled
that
means
that
the
queries
go
to
elasticsearch
instead
of
postgres
sql
and
those
are
completely
different.
A
Those
are
git
commits,
so
in
15.2
we
will
continue
our
work
on
adding
those
custom,
slice
and
sales,
and
our
final
top
priority
is
to
revisit
our
work
on
optimizing
workers
that
consume
a
lot
of
memory
and
cause
out
of
memory
kills.
So
in
the
past
we
have
investigated
some
issues
who
have
found
that
there
are
a
few
workers
that
consume
a
lot
of
memory
more
than
some
of
them
more
than
one
gigabyte
of
memory
and
even
two
or
three
of
them
more
than
five
gigabytes
of
memory.
A
In
some
cases,
and
I'm
talking
about
one
worker,
one
job
run
for
this
specific
worker
in
general.
We
don't
want
any
workers
to
go
above
100
or
200
megabytes,
so
all
those
workers
are
problematic.
So
in
the
past
we
have
addressed
the
most
hungry
ones
and
the
the
reasons
the
fixes
were.
For
example,
we
had
some
issues
with
parsing
of
coverage
reports
that
was
taking
too
much
memory
and
we
we
reduced
by
more
than
80
percent
or
while
sending
the
email
notifications
when
generating
the
email
from
the
templates.
A
If
the
notification
had
tens
or
hundreds
of
comments
reported
at
the
same
time,
that
will
consume
a
lot
of
memory
or
we
have
fixed
other
problems
with
them
class.
One
queries
on
the
on
some
controllers.
We
want
to
go
back
and
investigate
other
similar
workers
and
optimize
them.
So
that's
it
for
50.2.