►
From YouTube: GitLab Kubernetes Agent: TunnelRegistry deadlock fix
A
Hello,
my
name
is
mikhail
I'd
like
to
talk
about
this
issue
that
I
fixed
recently
so
you're,
seeing
timeouts
in
this
functionality
where
cuba
ctl
couldn't
get
reply
from
the
server
from
cars
in
32
seconds
or
at
all
yeah.
So.
A
Before
bumping
the
deployment
to
a
new
version-
graham,
thank
you
graham,
I
took
the
select
core
routine
stack,
trace,
dump
using
prof
and
that
helped
us
a
lot
to
understand.
What's
you
know
what's
wrong,
so
you
can
see
this
stuff
traces
here
and
then
I
open
the
merge
request,
fixing
the
the
deadlock
so
yeah
it
turned
out.
It
was
a
deadlock,
and
so
I
want
to
talk
about
how
I
found
the
problem
and
how
to
read
stack
traces
and
the
how
tunnel
register
works
in
general
to
start
with.
A
A
So
this
was
htc
one
14,
5
or
c1.
So
let's
check
out
that
version.
A
This
c2
is
the
version
with
the
fix.
This
is
the
version
that
was
buggy
and
so.
A
A
So
how
does
it
work
tunnel
registry
implements
two
interfaces:
tana
finder
find
tunnel.
This
is
the
method
that
is
called
by
the
cast
that
needs
to
find
the
tunnel
to
an
agent,
and
the
second
interface
is
tunnel
handler,
which
is
also
called
by
the
cast,
but
the
one
that
accepted
the
connection
from
an
agent.
A
So
first
we
register
the
connection,
basically
incoming
connection
from
an
agent
and
then
some
other
connection
from
another
cast
will
try
to
find
the
correct
tunnel
to
forward
the
request
to
so.
This
is
the
the
two
interfaces
that
panel
registry
implements
and
the
way
it
works
is
basically,
if
you
know,
if
you
heard
of
actor
model
or
actors,
then
you
should
understand
how
what
it
means
how
it
works.
A
So
an
alternative
approach
is
to
have
a
piece
of
shared
state
and
then
whoever
needs
to
do
something
with
it
read
or
write
grabs
the
lock
performs
the
mutations
or
whatever
it
is
to
do,
read,
write
both
then
releases.
The
lock
and
then
some
some
other
thread
may
come
come
along
to
do
something
else
and
yeah.
So
with
actors,
the
actor
owns
the
state
and
then
it's
a
gore
routine.
A
That
just
runs
and
listens
for
incoming
messages
and
when
it
gets
a
message
over
a
channel,
it
does
something
depending
on
message
usually
and
then
it
may
send
out
messages
or
you
know,
depends
on
what
one
wants
to
so.
The
difference
is,
is
there
are
no
locks?
There
is
a
dedicated
go
routine
and
it
interacts
with
the
outside
world
with
other
goroutines
via
channels.
A
Use
contact
says
they
just
compose
nicer
with
the
blocking
cause
blocking
calls
all
the
channels
you
can
mix
in
periodic
things
that
need
to
happen
within
that
go
routine
like
you
can
have
a
timer
that
fires
every
end.
Second,
to
do
some
garbage
collection
or
some
background
activity.
Even
if
there
are
no
colors.
If
you
do
this
via
locks,
then
you
need
to
have
a
separate.
A
A
A
A
A
This
is
the
first
one
and
then
request
requests
to
unregister
when
the
context
is
cancelled
and
then
we
have
requests
to
find
a
tunnel
and
request
to
stop
looking
for
a
tunnel
when
context
is
done.
So
let's
look
at
the
handle
tunnel,
so
yeah,
that's
where
we
receive
register
and
register,
find
and
stop
looking
for,
stop
finding
abort
find
request.
A
A
A
Then
we
either
wait
for
the
cancel
or
for
an
error
to
return
so
to
unblock
the
incoming
connection,
return
whatever
needs
to
be
reported
in
time
connection,
and
then,
if
the
context
was
cancelled
here
we
haven't
registered.
So
we
just
returned-
and
here
we
have
registered,
so
we
need
to
unregister
so
that
that
is
what
we
do.
We
try
to
tell
the
run,
go
routine
that
we
want
to
unregister.
A
A
A
Imagine
using
your
mutex
here
and
you
would
need
to
have
another
go
routine,
that
waits
for
context
cancel.
Then
you
would
need
to
use
broadcasts
on
youtube
to
tell
to
tell
the
whoever
holds
the
yeah.
I
don't
even
know
how
to
work.
I
don't
know
it's.
I
mean,
of
course
everything
is
possible,
but
it's
just
better
with
channels
in
this
particular
case.
A
So
let's
look
at
the
other
method,
which
is
called
by
it's.
Also.
A
A
A
Just
change
the
state.
There
is
a
simple
state
machine
inside
if
we
haven't
done
the
need
to
just
return
new
to
the
return
channel
so
that
this
method
is
unblocked
and
you
return
something
here.
Otherwise
it
will
get
stuck
forever.
So
you
see
you
it's
a
deferred
function.
So
even
if
we
have
an
error
here,
let's
say
we
still
unblock
the
command.
A
A
A
A
A
A
Here,
if
the
context
was
cancelled,
we
need
to
tell
that
this.
This
goal
routine.
That
find
requests
is,
we
know
we
are
no
longer
interested
in
the
result
and
if,
if
there
is
a
result
actually
concurrently
with
this,
then
we
get
the
and
we
got
the
tunnel
that
is
not
new,
then
we
just
you,
know
closed
or
done
it
immediately.
A
So,
let's
look
at
the
run
method
what
it
does
well,
I
have
shown
you
so
this
is
the
register
it
handles.
The
registration
then
register
it
handles
on
registration
and
find
and
find
the
board.
So
this
state
is
maintained
in
these
maps.
We
have
thanos
budget
ideas.
This
is
just
like
a
secondary
index
in
the
database
to
make
it
easier
to
find
tunnels
by
an
agent
id,
and
then
we
also
store
find
tunnels
request
by
agent.
A
So
I
yeah,
I
will
not
go
into
details,
how
this
works,
but
you
can
imagine
how
there's
nothing
super
interesting.
So,
let's
look
at
so
we
have
checked
out
the
correct
division.
Let's
look
at
the.
A
A
We
have
eight
hundred
fifteen
routines
blocked,
trying
to
register
so
they
are
calling
handle
tunnel.
So
this
is
all
the
gore
routines
that
from
from
agents
that
are
trying
to
establish
a
new
tunnel
connection-
and
we
can
see
this
here
by
the
structure-
it's
the
connect
method
or
we
can
just
copy
this
and
then
two
times
shift
paste
and
just
you
can
go
here.
He
needs
to
your
favorite.
A
Editor
can
probably
do
this
as
well
or
you
can
just
navigate
to
the
path,
and
why-
and
you
see,
they
are
called
and
let's
look
at
it
inside
of
handling
where
it
actually
works.
So
it's
blocked
on
this
select,
and
that
means
it's
trying
to
register
this
tunnel
so
and
here
we
do
not
see
because
it's
an
aggregated
view,
people
of
tells
us
that
the
right
hundred
fifteen
routines
blocked
here.
If
you
look
at
so
non-aggregated
view
which
it
can
also
give
you
a
few
space,
there
is
a
query.
A
A
So
anyway,
they're
all
blocked
trying
to
register
with
the
tunnel
and
that's
not
normal.
It
should
be
instantaneous
because
the
actor
go
routine
doesn't
do
anything
else.
It
just
reads
from
tunnels
from
channels
does
in
memory,
modifications
of
the
steady
tones
and
then
those
sorry
replies
via
the
other
channels
that
are
passed
like
find,
request,
passes
the
channel,
for
example,
and
also
the
register
requests
request,
passes
a
channel
to
reply
to
for
this
particular
request.
A
So
it
should
be
instantaneous
this
that
guarantee
doesn't
perform
any
ion.
It
doesn't
perform
any
anything
that
can
block
for
a
long
time
because
it
if
it
blocks
it's
a
bad
luck
very
quickly.
So
then
we
also
have
773
goroutines
that
are
calling
the
handofano
tunnel.
So
these
are
700
something
go
routines
that
are,
they
were
trying
to
perform
a.
A
Api
call
and
they
they
got
stuck
and
that's
also
not
how
to
work.
Obviously,
you
can
also
just
navigate
to
the
line
and
they
are
blocked
in
this
case.
Actually
there
they
have
registered
the
interesting.
A
Yes,
so
they're
trying
to
unregister
okay,
these
ones
yeah
they
were
they
have
registered,
but
then
they
were
trying
to
unregister,
oh
okay.
This
is
still
the
handle
tunnel,
not
fine,
so
yeah.
This
are
the
ones
that
are
trying
to
unregister,
because
the
contacts
have
been
canceled
and
they
are
stuck
here.
There's
no
reply
return
and
yours.
It
also
couldn't
unregister.
A
A
A
So
this
is
the
forward
stream,
which
is
followed
by
the
incoming
the
incoming
request
and
they
all
blocked
trying
to
read
from.
A
From
my
goal
to
performing
the
proxy,
but
it's
stuck
okay
yeah:
this
is
a
different
problem-
was
fixed
in
different
guarantee,
and
this
is
the
this
is
the
problematic
structures
it's
only
one,
because
that's
the
actor
holding
the
state,
so
it's
always
only
one.
Otherwise
we
would
use
matrixes
and
it's
having
some
unregistration
of
some.
A
A
A
Yeah,
it
should
be
checking
if
this
particular
panel
is
still
registered
with
that
active
routine
or
if
it's,
if
it
has
been
removed.
If
it
hasn't
been
removed,
if
it's
still
registered,
then
we
call
done
don't
block
it,
and
then
we
remove
it
from
our
state
in
member
state
and
yeah
wrong.
If
condition
left
to
deadlock.