►
From YouTube: It's time to Go! Episode 7 - mutex striping
Description
Fixing mutex/lock contention by partitioning the mutex. See https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/issues/386#note_1524010603.
A
Wrong
button,
hello:
this
is
it's
time
to
go
The
Reincarnation
of
that
meeting,
thanks
for
restarting
it,
and
this
is
the
first
one
and
then.
A
And
we
I
don't
know
what
to
talk
about.
Let's
talk
about
what
I'm
working
on,
so
it's
this
issue
with,
say:
routing
timeouts
again,
the
some
customers
are
getting
errors
agent
not
found.
Is
it
connected?
And
so
here
this
is
a
screenshot
of
the
old
dashboard
new
dashboard
you
can
see
this
graph
is,
should
be
zero
or
almost
zero,
but
there
are
occasional
spikes,
and
this
is
not
good.
It
was
worse.
A
A
We
see
that
the
lane
in
this
interesting,
it's
showing
you
something
else
again.
B
A
A
If
you
look
at
it
like
this,
the
one
percent,
we
can
see
that
this
thing
can
take
up
to
one
and
a
half
second
or
almost
two
seconds,
so
I
unlock
yeah
trying
to
walk
lock
the
meetings
unlock.
This
is
just
the
method
that
counts
the
con.
The
delay
I
I,
think
because
this
metric
shows
the
delay.
We
are
looking
at
the
delay
of
that
new
text
basic.
A
A
I
expected
this
after
one
of
the
fixes,
but
then
profiler
was
not
showing
this
and
now
today,
I
found
that
it
has
actually
showing
this
and
it's
not.
Okay,
so
I'll
reverse
down
yeah
and
the
previous
version
was
even
worse.
This
thing
was
blocked
for
up
to
like
14.
something
seconds.
That's
crazy,
I,
don't
know
how
that
even
it's
possible.
A
A
What
this
thing
does
Bridge
the
tunnel
registry.
It's.
A
Its
job
is
to
register
Thanos
from
HMK
in
radius
and
to
find
those
tunnels
across
all
cast
instances.
So
so
God's
instances
can
see
each
other's
registrations
of
tunnels
for
agents
and
then
class
can
route
and
oncoming
request
to
itself
or
another
class
instance
based
on
this
data.
A
You
cannot
register
this
is
the
thing,
so
the
there
are
two
main
methods
and
the
tunnel.
This
is
the
one
that
accepts
a
connection
from
agent
and
registers
it
and,
and
then
it
blocks
until
it's
canceled
or
the
tunnel
has
been
picked
up
and
used,
and
then
it
returns
a
new
owner
and
that
is
returned.
Then
the
other
important
method
is
to
find
the
tunnel
for
this
agent
ID
and.
A
We
try
to
find
it
a
tunnel
in
this
class.
Basically.
A
A
B
A
There
is
an
register
at
this
method,
which
is
the
same.
It's
called
user,
Matrix,
locked
and
so
the
problem
is.
There
are
a
lot
of
Agents
establishing
lots
of
panels,
so
they
all
try
to
register
and
register,
and
all
of
that
creates
contention
on
the
mutex.
And
what
makes
it
worse
is
that
the
register
and
the
register
IO
operations.
A
So
as
an
optimization,
actually,
register
Tano
and
registered
panel
only
set
the
value
once
they
don't.
So,
if,
if
there
is
a
cast
URL
for
this
agent,
capacular
of
this
cast
for
this
agent
is
there
already,
then
we
don't
need
to
do
this,
basically,
no
iOS
performed,
but
this
is
still
recording,
causing
some
problems.
Looking
the
based
on
profile,
so
I
think
this
mutex
is
the
bottleneck
basically,
and
that's
what
we've
seen
the
profiler.
B
That
map
data
structure
is
shared
among
the
threat,
so
we
use
sync
mutex
to
synchronize
them
right
like
these
three
data
structures.
Yes,.
A
B
Just
just
general
idea
is:
maybe
we
should
think
about
how
we
can
minimize
that
critical
section
like
studying,
mutex
log
and
the
end
of
transaction.
We
unlock
the
mutex,
but
we
need
to
minimize
the
transaction
right.
A
Well,
it's
not
it's
not
here
it's
in
the
track.
Okay,
so
we
used
to
run
garbage
collection
in
the
hash
synchronously
while
holding
a
mutex,
but
now
we
prepare
for
garbage
collection
of
our
holding
commuters
and
then
run
it
without
the
mutex
itself.
So
this
helped
quite
a
bit
I
see.
So
this
is
a
way
to
minimize
the
critical
section,
but.
B
A
B
A
A
A
B
A
A
A
B
A
Interesting,
so
we
use
it
on
multiple
places.
First,
in
this
case,
that's
where
we
store
tunnels
by
agent
ID,
so
key
is
Agent
ID.
B
A
A
So
that
way
we
can
look
up
a
list
of
cast
URLs
that
have
a
connection
from
this
agent
ID
and
there
are
a
few
other
places
where
we
use
this,
for
where
value
is
not
new
and
yeah.
So
the
purpose
of
this
is
to
add
expiration
to
to
the
second
level
of
Road
mapping,
because
in
the
radius
you
can
have
an
expression
on
the
whole
hash,
but
not
on
a
key
value
pair
in
inside
of
the
hash.
A
So
I
see
we
just
implement
this
ourselves
by
putting
it
time
to
leave
them
in
there
and
periodical
scanning
everything
and
deleting
what
has
expired
and
or
updating
what
hasn't
fixed
by
what
we
don't.
B
A
B
A
B
Yes,
then,
the
maximum
number
of
mutex
will
be
the
number
of
Agents
ID
or
some
number.
B
A
B
A
No,
it
shouldn't
be
necessary,
so
what
I'm
thinking
is
just
using,
maybe
so
I
will
prepare
this
stripe
already.
I
think
which
is
you
can
see
the
good
explanation
of
this
whole
idea
in
go
out.
Actually,
it's
a
Java
library
for
Google
virus
version
blocks
and.
A
A
A
single
stripe
is
this:
mutex
plus
m
a
value
values
some
value
using
a
generic,
and
then
we
have
a
list
of
those
things
and
we
use
power
of
2
to
use
bitwise
operations
to
get
the
remainder
of
the
division
instead
of
actually
using
your
Division
and
remember,
division
because
that's
much
faster,
we
just
and
with
the
mask
and
get
the
lowest
significant
M
lost
bits
using
just
end
with
the
mask,
and
we
prepare
the
mask
here.
A
So
this
is
all
Beats
set
maximum
integer
64.
signed
integer,
so
it
doesn't
matter
so
all
all
bits
set.
Then
we
move
move
it.
A
Or
we
could
we
could
move
right
with
64
minus
and
something
like
this
doesn't
matter
yeah.
So
this
is
our
masks
mask
and
we
just
pick
the
slot
in
this
slice
and
in
that
slot
we
have
a
mutex
and
the
value
this.
This
is
our
striped
value,
mutex
plus
beta,
not
just
the
mutex
but
mutex
plus,
it
is
just
more
convenient
to
use
and
we
have
a
lock
for
this
ID.
B
A
A
A
Because
this
thing
requires
GTM
refresh
so
I'm
at
the
same
time,
I'm
removing
a
text.
Let
me
show
I'm
removing
the
text
from
here
from
registracker
and
removing
garbage
collection
and
refresh
from
here
as
well.
You
see
removing
the
music,
because
now
the
register
will
be
responsible
for
calling
the
garbage
collection
and
refresh
because.
B
A
Kind
of
doesn't
make
sense
to
hold
the
lock
in
a
nested
way,
like
I
have
the
lock
in
the
second
layer
layer
if
the
first
layer
holds
the
lock
already
always
so.
I'm
yeah
I
decided
to
remove
it
from
here
and
to
every
single
layer
of
locking
to
simplify
things
and
speed
things
up
as
well,
but
yeah
I'm.
Thinking
of
doing
it
like.
B
A
A
B
B
How,
how
did
you
decide
just
n
and
eight
in
the
line
yeah.
B
Still
don't
freely
understand
that
that
bitwise
operation
like
mask
so
giving
here
the
number
six
or
eight
so.
A
B
B
A
A
B
Okay,
another
question
so
you
have
can
I.
Could
you
go
to
the
data
structure
of
s
state
or
estate?
Oh
yes,.
B
A
B
B
B
This
is
something
very
technically
challenged,
challenging
and.
A
A
A
B
B
B
A
A
A
A
People
just
might
do
without
those
interesting
Collections
and
other
things,
because
it
wasn't
possible
to
write
such
libraries
before
before
generics.
A
It's
always
generics.
People
will
probably
start
writing
defense
collection,
libraries
and
proceed.
A
B
A
A
And
these
are
different
yeah.