►
Description
Configure team fix a bug related to the metric that count the number of unique connected agents
A
A
We
wanted
to
discuss
because
we
implemented
a
prometheus
metric
that
tracks
the
unique
number
of
connected
agents,
but
mikhail
figured
out
recently
that
there
is
a
bug,
because
when
we
saw
the
metric
is
we
see
like
this
weird
behavior,
where
the
agents
go
up
and
then
zero
and
then
close
to
zero,
and
also
I'm
wondering
this
is
this
seems
also
wrong,
because
the
number
of
connected
agent
k
pods,
should
be
higher
than
the
number
of
unique
connected
agents
right.
This
is
not.
It
doesn't
seem
correct.
B
C
B
B
B
Register,
that's
what
it
calls
register
and
then
it
goes
under
register
when
it
needs
to
unregister.
So
in
register.
We
put
elements
into
the
into
three
three
sets:
it's
like
secondary
indices
in
the
database,
connect
agents
by
project
id
by
agent
id
and
then
just
connected
agents,
a
set
of
the
ids
of
the
agents.
Basically,
and
then
we
just
get
the
length
of
that
set
here
to
get
that
number
that
we
graph
and
a
particular
agent
id
can
be
deployed.
B
B
They
all
try
to
put
this
id
into
a
set
in
radius.
Basically-
and
that's
what
we
do
here
and
if,
if
multiple
put
the
same
number,
it's
still
a
single
number
in
the
set-
and
it's
counted
as
a
single
thing
all
good
here
and
when
we
remove
we,
don't
you
see
connected
agents?
Unset,
we
don't
have
it
here.
Oh.
C
B
B
An
agent
disconnects
for
from
an
air
cast
instance,
other
pods
may
still
be
connected,
and
they
still
should
be
counted
because
this
agent,
that
is
still
connected.
So
we
don't
do
this
and
instead
rely
on
garbage
collection
to
clean
up
eventually
the
ideas
of
agents
that
don't
have
anything
a
single
port
connected.
So
once
all
pods
disconnect
garbage
collector
will
remove
this
id
from.
B
B
B
That's
what
it
correlates
with
it's
more
visible
on
staging,
because
in
staging,
apparently
we
we
get
addresses
more
often
than
in
production.
I
guess
our
infrastructure
team
does
things
there
and
for
some
reason,
class
is
more
often
restarted.
So,
but
why
so?
When,
when
calcium
started,
it
kind
of
something
happens
and
eventually
those
things
are
garbage
collected
so-
and
this
is
the
connected
agents
hash,
so
how
how
this
hash
works
is
so
what
we
do
is
we
term
to
recap?
B
We
set
and
we
don't
unset
and
we
we
also
have
refresh
and
garbage
collection,
so
garbage
collection
goes
through
all
the
items
in
the
radius
and
removes
the
ones
that
have
expired
and
what
refresh
does
is
it
fights
garbage
collection?
So
if
item
is,
should
still
be
there,
because
this
cast
instance
is
still
running
and
it
wants
this
item
to
be.
There
refresh
puts
this
item
back,
writes
it
to
update
the
timestamp.
C
B
B
B
This
is
what
use
expiring
value,
so
we
track
when
a
key
value
pair
in
hash
expires
in
the
value
itself.
So
what
gc
does
it
goes
all
through
all
the
values,
all
the
key
value
pairs
and
checks,
each
value
and
checks
the
expiration
of
that
value,
and
then,
if
it
expired,
it
deletes
it.
That's
why
we
have
gc.
C
B
If
nothing
happens
in
in
the
whole
hash,
this
whole
hash
will
be
expired,
all
key
value
pairs,
but
there's
no
functionality
for
a
particular
key
value
pair
to
have
expiration.
Unfortunately,
so
we
do
it
manually
and
refresh
just
bumps
this
this
values,
but
it
doesn't
do
it
for
all
the
values
in
redis.
It
only
does
it
for
the
ones
that
this
class
put
there
right,
because
we
only
want
to
refresh
what
we
care
about.
We
don't
want
to
refresh
what
other
casas
put
there.
B
We
actually
want
to
delete
the
things
they
put
there
if
those
classes
know
are
no
longer
running
right.
If
a
cast
crashes,
we
want
to
clean
up
after
it
and
that's
why
we
have
gc
and
we
only
refresh
our
own
keys.
So
our
own
keys
is
this
data.
Here,
it's
just
a
map,
basically
from
key
to
the
so
expiring
hash.
It's
not
a
single
hash,
it's
a
hash
of
hashes,
so
you
have
a
key
and
then
you
have
a
key
value
inside
of
it.
B
This
is
so
we
can
have,
for
example,
connections
by
project
id,
so
key
is
the
project
id
and
then
we
have
connection
id
to
data.
So
it's
a
hash
of
hashes,
basically
and
the
first
thing,
the
by
project
id
it
identifies
the
radius
hash
and
then,
within
that
redis
hash
we
have
the
connection
id
to
the
value.
So
that's
why
it's
a
hash
of
hashes
anyway,
this
data
thing
it
keeps
the
value
until
cursor
starts
right.
B
B
Let
me
let
me
go
through
this
again,
so
we
we
set
it
when
we
set
it.
We
not
only
put
it
in
radius.
We
also
put
it
into
the
map
so
that
we
can
refresh
it
see.
We
put
it
into
the
map
in
memory
in
memory.
This
class
has
this
in
memory
and
then,
when
we
call
refresh
every
now
and
then
like
this
refresh
registrations,
which
is
called
when
a
ticker
ticks
so
every
refresh
period,
the
sticker
ticks
and
we
call
refresh
refresh
registrations
and
we
call
refresh
on
all
the
hashes.
B
So
we
are
kind
of
keep
poking
it
so
that
it
doesn't
expire
and
it
doesn't
get
garbage
collected
and
because
it's
still
in
memory,
so
it
only
goes
away
by
garbage
collection
like
when
casserole
starts,
and
this
cast
that
has
been
restarted
or
another
cast
eventually
see
that
the
value
has
expired
and
deleted.
And
then
that's
the
drop.
C
A
It
it's
like
okay,
so
it's
when
the
new
cast
that
comes
in
that
figures
out
that
that
that
hash
was
expired
by
looking
at
its
value
in
redis,
then
it
cleans
it
up.
A
D
A
B
D
A
A
B
A
C
D
C
B
See
I.
C
B
A
B
Fine
and
then
they
restarted
and
then
like
couple
of
minutes
later
three
minutes
later
and
then
how
many
minutes
later.
B
B
C
A
B
No,
it's
not
correct
what
I
think
it's
it's
it's
at
this
high
number,
because
in
staging
we
have
qa
or
something
running
so
new
agents.
That's
my
guess!
Actually
I
don't
know
that.
But
that's
the
only
explanation
I
can
come
up
with
is
that
in
staging
we
have
a
qa
or
something
that
creates
new
agents
and
the
new
agents
connect,
and
that's
why
the
graph
is
going
up.
B
B
So
this
pro
this,
the
the
bumps
should
probably
correlate
with
ci
jobs.
And
this
also
this
graph
also
grows
right.
It
was
one
so
there's,
probably
an
agent
for
in
staging
that
is
always
connected,
and
then
you
see,
qa
test
runs
and
it's
two
and
then
it's
the
test
finishes
and
it's
zero
and
then
the
test
runs
and
we
get
a
bump.
C
B
A
Yeah
but
yeah,
but
but
it
keeps
on
redis,
I
see
yes,
it
keeps
incrementing
okay,
then
then
yeah,
but
that's
yeah,
the
the
weird
thing
still.
I
think
I
think
this.
This
explains
a
lot,
but
if
you
go
to
that
chart
before
when,
like
the
chart
that
you
had
before
from
staging
that,
you
can
see
that
it
drops
when
it
drops
on
a
restart.
A
B
C
B
A
B
B
C
A
C
B
C
A
B
B
Don't
okay.
C
B
B
Yeah,
so
we
need
to
test
that
it
removes
it
from
memory,
but
it
doesn't
remove
it
from
redis.
B
B
D
A
A
B
C
A
B
B
A
Yeah,
it
would
fail
at
some
point
because
for
this
okay,
so
forget
and
the
same
thing
here,
the
agent
id
now
we
just
need
agent
id.
I
guess.
A
B
A
B
C
B
B
C
B
Ignores
it
ignores
the
past
interface
line,
two
three
because
yeah
we
don't
care
about
this
thing,
because
we
have
a
single
hash
and
yeah.
That's
why
we,
whatever
we
pass
kneel
or
anything
else
we
it
doesn't
matter,
so
we
pass
new
just
to
make
it
more
explicit
kind
of.
C
B
B
D
A
A
Yeah,
I
think
you
can
create
a
you
have
a
preset
for
tabs,
but
I
never
done
this
and
you
can
even
restart
commands.
You
can
I've.
A
B
B
A
C
B
To
do
is
to
make
to
see
if
it's
possible
to
run
these
commands
concurrently
mock
gen,
but
then
I
thought
well
to
speed
everything
up,
but
then
I
thought
some
mocks
can
depend
on
some
other
mocks.
Theoretically,
it's
probably
a
bad
idea,
but
maybe
yeah.
We
probably
don't
have
this
problem
anyway.
I
didn't
spend
too
much
time
researching
how
to
make
it
generate
things
concurrently,.
B
A
But
they
have
asked
to
regenerate
just
for
the
files
that
I
want
to
yeah.
B
B
A
Yes,
the
thing
is
that
I
wanted
to
send
it
to
berlin
and
yeah
and
I
only
arrived
there
on
the
27th.
So
I
put
a
note
there
and
yeah.
I
couldn't
got
it.
I
have
gotten
it
yet,
but
yeah.
I
should
get
it
in
the
next
month.
Hopefully.
B
B
A
B
C
A
B
C
A
C
A
A
C
C
B
This
you
mean
no.
This
is
fine
on
line
87,
you
have
gomock
in
order
around
two
mocks
around
two
stone
convocations
and
each
mock
invocation
returns
an
object
that
identifies
this
invocation,
and
then
we
tell
the
mocking
framework
that
we
want
these
invocations
to
be
in
order,
and
we
do
it
by
passing
those
values
that
these
marks
return
to
the
inorder
function,
I'll.
B
C
A
C
B
B
B
Things
up
I'm
talking
about
the
bazel
wheel
by
by
moving
the
sandbox
for
simulinks
into
a
memory
volume,
it's
documented
in
the
contributing
I
think
file.
This
will
run
faster.
That's
what
I'm
saying
in
the
docs
in
the
docs
directory
yeah
this
file.
I
think
that's
where
it's
com
documented,
so
you
create
a
memory
drive
and
then
tell
bazeo
to
use
that
for
temporary
files.
Basically,.