►
From YouTube: SIG - Performance and scale 2022-08-04
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
A
All
right,
everyone
welcome
sixth
scale.
It's
august
4th
I'll,
put
a
link
to
the
notes
in
the
chats
you
can
follow
along
okay
and
please
add
yourself,
an
attendee.
Please.
A
For
the
record,
okay,
all
right,
let's
get
started.
Olay
you've
got
the
first
item
once
you
kick
us
off.
B
Sure
so
this
is
just
following
up
from
the
last
call.
We
discussed
that
I
I
would
file
an
issue
for
measuring
deletion
time
like
the
p99s
and
p45,
I
think,
of
the
deletion
time.
So
I
started
looking
at
the
matrix
for
deletion.
Second,
and
one
thing
I
found
was
that
keyboard
vmi
phase
transition
time,
that's
actually
the
one
that
is
used
for
creation
time
as
well.
B
So
I
was
looking
for
a
parallel
on
deletion
time
and
it
looks
like
so
based
on
how
and
correct
me
if
I'm
wrong,
but
based
on
how
I
understand
the
deletion
workflow
is
that
user
issues
cube
cutter,
delete
vmi
the
pod
gets
deleted.
Well,
first,
the
controller
slaps
the
finalizer,
the
part
gets
deleted.
B
We
wait
for
grace
termination
period,
lib,
vert
issues,
the
launcher
part
issues,
delete
or
seek
term
all
the
parts
go
down
and
then
the
finalizer
gets
removed.
Before
that
the
phase
will
transition
to
succeeded
or
any
of
the
final
states,
and
then
you
know
it
will
be
finalizer
will
be
removed.
So
I
think
there
is
a
distinction
between
when
the
final
state
was
reached
and
when
the
object
was
actually
finalized.
B
A
Yeah,
so
the
so
what
I
believe
the
yeah,
so
the
current
metric,
the
one
you
have
highlighted
here,
should
measure
the
time
from
when
we
make.
I
think
when
we
observe
the
the
final,
like
the
deletion
timestamp
on
the
object,
all
the
way
until
the
finalizer
is
removed.
I
believe
that's
what
it
does,
so
it
doesn't
get
the
when
it's
actually
like
the
pod
is
cleaned
up
or
the
v-line
gets
removed
from
mcg
or
anything
like
that,
and
so
it
says
your
question
like
okay:
does
it
matter
like?
A
B
No
actually
so
when
I
was
looking
at
the
code,
I
think
I
have
the
link
there
it
so
the
the
metric
seems
to
be
from
where
delete
call
is
issued
to
when
you
see
the
face
transition
in
in
the
status
right,
but
but
removing
the
finalizer
is
not
in
the
phase
transition
like
it's
not
reflected
in
the
status
phase.
B
So
I'm
not
like
I'm
not
sure.
A
Yeah,
so
it's
not,
but
it
should
so.
What
I
remember
about
this
is
that
it
should
we
should
stop
measuring
when
the
finalizer
is
so
right.
It's
not
actually
when
the
finalizer
is
removed.
It's
like
the
moment.
It's
not
when
it's
it's
like
right
before
it's
removed,
just
like
where
we
should
be
measuring.
A
That's
what
I
remember
for
this
like
it
should
be
like
okay,
so
so
it
should
be
just
the
whatever
if
it's
the
so
deletion,
so
we
do
a
delete.
We
populate
the
vmi
with
the
deletion
timestamp
and
then
I
think
that's
when
we
start,
and
then
we
finish
right
before
we
remove
it
right.
That's
like
that
should
be
the
only
thing
that
I
so
so
like.
I
just
want
to
get
to
your
make
sure
I'm
getting
to
your
question,
though,
like
so
that
should
cover
the
deletion
time
spent
in
q.
Bert,
that's!
A
B
I
see
so
like
if
you
go
to
get
transition
time
seconds,
it's
the
function,
yeah,
so
yeah.
If,
like
I
from
what
I'm
understanding,
the
transition
is
happening
in
line
54.
The
new
time
is
so
that
is
just
a
phase
transition,
timestamp
yeah.
So
from
I
I
it
seems
that
we
are
measuring
from
deletion.
B
Timestamp
up
to
the
phase
is
transition
to
final,
which
is
what
which
is
correct
in
the
sense
that
we
are
measuring
things
that
cube
word
is
responsible
for,
but
if,
let's
say
due
to
some
other
reason,
the
object
is
not
finalized
and
we
have
not
removed
the
finalizer
after
it
has
transitioned
those
kinds
of
bugs
or
metrics
will
those
kind.
We
don't
have
any
metrics
to
reflect
that
the
the
rest
of
the
time
is
in
time
after
the
final
transition
to
the
finalizer
actually
being
removed.
A
A
Well,
my
concern,
I'm
I'm
not
sure
how
we'd
add
it
I
mean.
Maybe
you
have
some
ideas
because,
like
the
challenges
now
once
we've
so
once
we
say
like
we're
removing
the
finalizer,
it's
like
our
expectation
is
now
it's
going
to
be
gone.
Maybe
we
can
see
like
like
this.
This
is
a
promise
once
we
once
we
like,
we
really
can't
see
it.
It's
not
guaranteed
that
we've
once
it's
removed,
we
don't
it's
not
guaranteed
that
we
actually
will
be
able
to
see
the
object
again
and
so
the
so
there's.
No.
B
Yeah
so
the
way
we
so
when
I,
when
we
looked
at
that
when
we
added
stuff
in
the
last
pr,
the
garbage
collecting
of
vmi
objects
after
deletion,
what
what
I
did
was
attach
like
update
and
delete
function
to
the
informer
and
if
the
object
is
missing
in
the
informer.
That's
when
I
say:
okay
now
it
is
garbage
collected,
so
we
could
be
like
just
a
time
after
it
was
actually
removed.
B
We
could
be
off
by
a
small
delta
when
the
informer
gets
updated,
but
it
would
be
fairly
close
representation
of
when
it
was
actually
deleted
and
finalized.
A
A
I
mean
I
guess
like
so
it's
really
a
matter
of
how
we
want
to
characterize
this
then
so
because
I
mean
I
guess
I
mean
I
I'm
kind
of
open
to
going
either
way
on
on
this,
I
mean
to
me
like
when,
at
least
when
the
first
approach
for
this
was
the
idea
was
that
we
would
we'd
only
capture
what
q
bird
is
doing,
because
it
would
just
isolate
the
possibility
that
that
there's
a
problem
on
the
keyboard
control
plane,
because
we're
trying
to
measure
our
performance
and
the
only
things
that
we
touch.
A
B
And
I
think
with
that
data,
so
I
remember
like
couple
of
bugs
that
we
ran
into
due
to
the
launcher
pod
that
maintains
the
state
on
on
the
pod
in
cubelet
and
then
that
being
reflected
in
the
in
the
controller
through
the
in
through
the
custom
informer.
B
I
think
the
bugs
were
that
a
lot
of
time
is
being
spent
after
going
into
the
final
state
and
before
being
actually
finalized.
Remember
that
one
of
the
first
bugs
that
I
actually
looked
into
so
it
feels
like
if
we
add
that
time,
those
kinds
of
bugs
would
be
easy
to
get.
I
I
realize
those
are
like
very
rare
bugs
and
but
at
the
same
time
those
are
hard
ones
to
go
after
and
having
all
these
kinds
of
data
points
would
be
useful.
A
Yeah
so
I
would
say,
like
they're,
rare
and
yeah
that
you're
right
and
that
they
can
exist
and
and
that
they're
they're
rare,
because
it's
we're
gonna
we're
stopping
our
measurement
right
before
we're
supposed
to
remove
it.
So
the
what
would
have
to
happen
is
like
something
would
have
to
go
wrong
in
kubernetes
or
something
that
would
we
would
not
remember
the
finalizer
or
something
like.
I
think,
that's
what
were
what
would
cause
this
to
happen,
but
even
even
it.
B
Even
even
something
could
go
wrong
in
our
our
logic,
right,
like
keyboard
logic,
because
after
that
final
transition
keyboard
does
do
some
work.
You
actually
go
ahead
and
finalize
it,
so
it
needs
to
clean
up
the
links
on
in
the
directory
in
the
launcher,
port
and
stuff
like
that,
but
yeah
I
am
open.
I
just
wanted
to
bring
this
up.
If,
if
this
is
worth
adding,
I
can
create
an
issue
and
go
after
if.
A
Oh
yeah,
I
I
think
it
is
just
that
I
think
what
we
have
to
figure
out
is
like
how
to
characterize
this,
because
the
transition
is
sort
of
it's.
It's
like
the
the
trouble
I'm
having
with
it
is
like
we're
currently
measuring
the
time
spent
as
pretty
close
as
best
we
can
in
cube.
The
way
keyboard
is
handling
the
object
during
deletion,
the
the
rest
of
it
we
could
get,
but
like
we
it's
trying
to
find
the
way
to
characterize
it,
because
I
think
the
way
we
have
it
now
is
valuable.
A
B
A
Creation
side
there's
the
same
thing:
it's
there's
the
unknown
state.
At
the
start.
We
actually
have
that
as
part
of
like
when
we
create
the
object,
we're
in
an
unknown
phase,
because
we
haven't
actually
added
a
phase
to
it.
So
there's
a
period
of
time
where
this
actually
is
is
captured
in
the
transitions.
A
So
we
don't
do
it
on
the
deletion,
so
we
we
are
technically
in
an
unknown
state
after
we
expect
to
remove
the
final
answer,
because
it's
out
of
our
hands,
I
think
we
could
do
the
same
thing.
We
could
maybe
characterize
it
the
same
way
like
we're
just
in
a
different,
different
state
or
something
something
like
that.
Maybe
a
way
to
communicate
it.
B
Makes
sense,
I
I
was
thinking
time
to
time
for
object
to
get
finalized.
Something
like
that
because,
like
it's
a
well-known
word
right,
like
getting
an
object,
finalized
is
meaning
waiting
for
it
to
remove
the
finalizer.
So
I
was
thinking
something
along
that
lines
would
fair
like
appropriately
classify
it.
The
only
distinction
I
think
in
that
we
need
to
be
careful
is
that
it
could
be
confusing
in
the
sense
that
it
captures
the
entire
from
user
issues.
The
delete
request
up
to
being
finalized
versus
okay,
cube.
B
Vert
has
observed
the
final
state
up
to
actual
actual
finalizer
being
removed,
so
those
that
two
things
will
be
important
distinction
to
make
in
the
language
of
that
metric
distinction.
But
that
was
what
I
was
thinking.
I
I'm
not
sure
if
that
helps.
A
We
just
distinguish
from
the
time
that
keywords
done
with
it
to
the
time
that
it's
being
garbage
corrected
and
that
might
just
be
a
an
additional
step
that
we
have
in
this
metric.
We
just
we
just
make
it
a
special
case.
We
just
you
know
we
don't
say:
okay,
it's
you
know
this
was
the
phase
it
transitioned
to
or
something
it's
really
that
it's.
You
know
we're
we're
worth
we're.
I
don't
know
we
have
to
like.
A
That's
why
I
want
to
use
unknown
because
we're
technically,
once
we've
reached
whatever
phase
failed
or
succeeded,
you
know
we're
and
we
don't
have
a
finalizer
we're
it's
a
totally
different
state
than
if
we
do
so
it's
I
don't
know,
we
could
call
it
unknown.
I
think
it's
maybe
or
we
call
it.
We
could
just
make
up
our
own
thing.
I
don't
know
we
could
just
say,
like
I
said,
final
state
garbage
collected
whatever
I
think
whatever
it
is,
we
just
we
don't
say
we
just
have
to
make
an
exception
here.
A
We
don't
post
as
a
phase,
failed
or
succeeded.
We
just
we
say
something
else.
A
Might
be
easier
because
then
it
then
you
don't
have
to
create
a
whole
new
metric
and
you
can
just
integrate
with
it
and
it
allows
someone
to
look
end-to-end
from
the
compared
to
the
deletion
time
stamp
if
they
want
to
and
also
compare
it
to
the
you
know,
the
the
q
verts,
the
current
metric,
which
is
how
we
the
current
phase
transition
so
from
failed
to
whatever
we
call
it
garbage
collected,
how
much
time
it
takes.
B
Right
yeah,
I
I
think
that's
that's
what
I
had
in
mind.
As
in
I,
I
was
not
suggesting
on
updating
the
current
logic
because,
like
folks
are
already
used
to
it,
and
we
don't
want
to
break
that
compatibility,
maybe
adding
a
new
metric,
like
you
said,
would
would
like
help
solve
all
all
the
cases
and
would
be
an
additional
data
point.
Moving
forward.
A
B
Okay,
yeah,
then
I
will
take
an
action
item.
Do
follow
up
on
this
discussion,
so
I
I
now
have
two
issues
that
I
have
to
file
one
for
this
and
then
one
for
the
percentile.
A
Matrix
objection
here,
let
me
summarize
here
so
we'll
add
so
like
matty.
We
call
it
a
an
additional.
A
A
B
A
A
I
think
we've
just
settled
on
200.
Let
me
check
the
looks
like
it's
just
200,
let's
see
so
it's.
A
Okay,
I
don't
remember
somewhere
in
here,
there's
the
I'll
have
to
look
I'll
plug
it
up
after
something,
but
it
looks
like
it's
200,
so
the
I
think
it's
just
the
yeah.
Okay
wait.
A
second
all
right,
205
delete
turn
five.
A
Events
account
it's,
I
guess,
that's
fine,
205
205.!
This
is
good.
I
actually
so
we
we
weren't
getting
delete
pod
counts
before
this
is
actually.
This
is
good,
we're
gonna
we
can.
Actually.
A
This
is
good
to
see,
but
we're
gonna
have
this
now.
Okay,
let's
see
everything
else
in
here
looks
just
about
the
same
you're
like
testing
the
low
numbers.
Patch
events
counts
patch,
virtual
machines.
This
is
accounts,
okay
and
then
here's
our
update
via
account.
Okay,
let
me
compare
this
to
the
let's
go
to
the
periodic
doesn't
have
200.
I
want
to
compare.
A
A
All
right,
update
virtual
machine
instance
account.
Okay
and
the
other
thing
is,
we
shouldn't
see
things
yeah.
We
shouldn't
see
them.
Okay,.
A
A
So,
let's
see
10
to
1
2
146.,
this
is
gonna,
be
a
little
different.
I
think,
because
of
the
deletes
okay
yeah,
so
we're
pretty
close
to
we're
actually
still
under
on
the
ten
to
one
for
the
updates.
A
A
Or
so
it'd
be
quote,
yeah
it'll
be
close,
so
yeah,
it's
it's
that's
very
close.
I
I
think
it's
just.
I
think
it's
just
the
I
mean
this
is
the
same
code,
so
it
shouldn't
it
shouldn't
vary
much.
So
I
think
it's
just
the
deletes
that's
what's
causing
this,
but
I
think
what
we'll
need
to
do
with
this
one
is
we'll
need
to
evaluate
how
we
can
set
the
threshold
so
that
we
can
also
pick
up
deletes
in
here.
So
maybe
it
needs
to
be
higher,
okay,
something
we
can
consider.
B
A
So
the
when,
since
we're
doing
the
deletes
any
delete
that
we're
making
is
gonna
cause,
http
requests
to
be
made
into
the
vmi
right
to
update
the
spec
or
the
status,
and
that's
what's
caught
this
we're
picking
up
here
so
like
adding
the
finalizer
removing
the
finalizer.
A
A
No,
I
I'm
I'm
almost
positive,
we're
not
deleting
like
99
sure
yeah,
so
the
we're
not
so
we're
not
getting
we're
getting
different
numbers.
Like
look
at
that,
you
can
see
here's
patch
search
machine
instances,
here's
patch
it's
completely
different.
This
is
just
straight,
create
and
check
yeah.
B
A
So
we'll
have
to
we'll
have
to
look
at
adjusting
this.
This
is
it's
interesting
to
see
here,
though,
the
number
of
yeah
look
at
so
we
have
17.
The
threshold
is
roughly
two
to
one
since
we're
at
two
roughly
100,
something
it's
about
two
to
one
to
create
or
well
under
that
and
the
patches.
But
here
it's
almost
one
to
one.
B
A
So
we'll
have
to
so
this.
This
is
good
data.
I
think
we'll
have
to
look
at
seeing
how
we
can
make
some
thresholds
here.
This
is
really
good,
though
I
what's
also
interesting
is
this.
This
is
so
we're
a
hundred
more
vmis
and
look
at
the
difference
between
our
p50,
p95
and
p99
I'll
disregard
this
one
and
just
say
like
these:
two,
the
15
and
the
19
and
a
half
are
quite
far
off
on
these
yeah,
which
is
interesting.
A
I
wonder
if
it's
I
wonder:
if
it's
you
know
the
extra
100,
if
we're
getting
it,
was
what's
heavily
contributing
to
these.
That
would
be
interesting.
B
A
Okay,
well,
I
think
so
for
just
for
we're
gonna
we'll
keep
an
eye
on
this.
Now
that
we're
getting
the
data.
This
is
really
good.
Let's
we'll
do
a
comparison,
maybe
in
a
week
and
maybe
a
week
or
two
we
can
take
in
look
at
like
three
or
four
of
these
jobs
and
try
and
like
see
if
we
can
settle
in
on
something
we
deem
reasonable
and
I
think
the
most
important
ones
are
updates
the
p90
p95
p50
the
patches.
B
So
I
I
had
a
question
regarding
that,
so
these
is
there
a
way
we
could
get
these
numbers
without,
like
scraping.
The
pro
ci
ui
like
where
do
these
numbers
come
from
is:
is
that
like
a
prom
server,
that's
running
somewhere.
A
Yeah,
so
it's
in
it's
from
prometheus.
This
is
so
this.
What
this
is
is
a
it's
the
the
per
scale
audit
tool
I
can.
I
can
show
you
it's
and
alls
it
does.
Is
it
points
at
the
points
at
the
end
point
in
prometheus
and
then
it
does
and
it
generates
a
an
output
that
you
can
read.
I
was
trying
to
find
an
example.
A
B
A
Yeah,
that's
right!
So
you
you
just
point
it
yeah!
You
give
it
a
url,
you
point
it
at
the
the
prometheus
instance
and
it
will
scrape
for
you
over.
You
can
give
it
a
period
of
time
and
then
it'll
do
the
rest
for
you,
it'll
sort
of
generate
in
in
this
way.
B
Yep,
so
I
was
wondering
like
for
the
other
issue
where
I
need
to
look
at
deleting
deletion
time.
I
was
wondering
if
I
can
get
the
url
for
for
that
prom
instance.
So
maybe
I
can
run
some
queries
before
putting
it
in
in
the
code.
A
Yeah,
let
me
sure,
let
me
see
if
I
can
find
well,
so
what
I
would
the
easiest
thing
for
you
to
do
is
to
do
it
locally
with
make
cluster
up.
You
can
just
add
the
prometheus
instance
to
your.
I
think
it's
just
a
flag
like
you
just
set
it
to
true
and
then
it'll
deploy
it
and
then
you
should
be
able
to
port
forward
it
locally
and
then
and
then
you
can
run
the
tool
against
it.
A
That's
that
would
be
the
easiest
way
to
do
this,
so
you
don't
have
to
go
through
the
the
ci
sounds
good.
Okay,
all
righty,
we'll
do
some
we'll
do
default.
Look
at
this
and
see
if
we
can
get
some
thresholds.
This
is
good
to
see
now
now
that
we're
getting
some
more
data,
we'll
have
to
continue
to
look
at
these
and
see
if
we
can
settle
in
a
threshold
for
some
of
these
okay.
A
Oh
actually,
let
me
see
this
or
one
more.
I
forgot
if
there's
so
cluster
performance,
we've
got
the
hundred
density,
okay,
so
here's
the
density.
Okay,
let's
look
at
this
one!
This
is
the
same
test.
It's
just
a
hundred.
The
other
one's
200,
okay,
that
makes
sense.
Let's
see,
let's
look
at
this-
let's
see,
let's
do
a
comparison
of
of
these
two
and
let's
look
at
the
p95
and
the
people
do
and
let's
see
how
they
come
out.
A
Okay,
so
a
little
bit
closer,
but
still
a
little
off
on
the
so
here's
our
here's,
our
periodic
that
this
these
are
running
in
the
dedicated
cluster.
Okay,
so
we're
a
little
bit
off.
It's
almost
double,
but
how
close
is
it
to
these
yeah?
So
it
looks
like
I
mean
36
we
get
a
59.
We
go
to
49
to
89.,
interesting,
okay,
we're
going
to
follow
up
on
this.
I
think
let's
keep
an
eye
on
we'll.
Do
it
in
further
meaning
in
in
future
meetings.
We'll
keep
an
eye
on
this
comparison.
A
A
Okay,
that'll
do
we'll
just
do
some
follow-up
on
that
one
all
right,
let's
go
to
the
the
liver,
go
memory
leak,
so
I
saw
this
one
in
there.
You
go
if
you're
still
here.
C
Yeah,
basically,
the
bcc
tools
reveals
reveals
the
leak.
The
root
of
the
leaks
comes
from
the
scrape
generated
by
prometheus.
C
Yeah,
basically,
it
was
a
wrong
data
type
in
the
libvirt
go
module
and
this
generates
an
error
that
that
wasn't
propagated.
So
it
remains
in
the
in
the
in
the
memory
and
and
and
remains
there.
It
is
cool
that
there
was
three
allocations
for
each
scrape,
calls
and
yeah.
It
was
hard
to
find
out
where
it
where
it
comes
from
because
it
was
outside
our
our
code.
C
When
I
say
hour,
I
say
cube
beard
but
yeah
after
that
people
from
the
vertical
module
find
it
really
really
fast
and
yeah.
It
was
solved
at
the
end.
Okay,.
A
Great
so
does
this:
do
we
need
to
backpart
this?
That
was
one
of
the
questions
I
had
because
I
think
oh
looks
like
you've
already
have
one
here
so
two,
three:
five,
yes
or
zero.
Five.
Three,
I
mean
okay.
C
Yes,
we
have
backported
it
on
release,
33
53,
yes,
and
probably
we
should
backboard
it
also
for
54
and
55.
A
B
A
Brian
brian
here
I
just
had
a
question
on
this
one.
Do
you
remember
there
was
a
number
of
prs
to
increase
the
memory
on
the
performance
jobs
I
was
just
wondering.
Was
that
a
sign,
an
early
sign
of
this
memory
leak
or
do
you
think
unrelated?
Could
it
have
been
an
early
sign.
B
A
C
Yeah,
I
I
don't
know
I
I
can
tell
you
that
the
memory
leak
was
generated
by
a
flag
that
we
enabled
for
the
metrics
of
the
of
the
vert
launcher
in
the
caller
get
all
domain
stats.
We
enabled
the
delta
rate
asking
for
the
dirty
rate
in
the
delivered,
and
this
generates
the
the
memory.
But
I
don't
know
if
it
is
related
to
to
what
you
say.
C
No
is
that
not
right,
I,
I
don't
think
so,
usually
since
the
scrape
from
prometheus,
it
runs,
I
think
at
every
10
seconds
probably
and
the
the
leak
was
about
219.
C
Bytes
so
essence,
since
it
prometeus
calls
the
scrape
about.
I
think
I
don't
remember
exactly,
but
I
think
once
every
10
seconds,
so
it
it,
we
will
eat
about
one
megabytes
per
day
about
around.
A
C
No,
this
one
basically
only
introduced
a
test
that
checked
the
the
memory
of
the
vert
launcher
and
birth
launcher,
monitor
and
all
the
process
in
the
vs
launcher
port.
Sorry,
and
so,
if
this
this
pr
can
be
considered
the
cause,
because
here
is
when
we
started
to
look
at
to
look
at
that-
okay,
but
the
vr
that
causes
the
leak
was
another
one,
or
at
least
that
reveals
the
leak.
A
Okay,
all
right,
I
have
one
so
one
more
thing
for
you
federico
while
you're
looking
did
you
what
what
did
you
have
for
ideas
in
terms
of
tooling?
You
know
that
we
could
do
for
detecting
these
memory
links
inside
the
launcher.
Do
you
think
about
that
at
all.
C
Sorry,
I
was
concentrated
the
searching,
the
the
pr,
if
I
understand
correctly
you're,
asking
about
the
other
bpf
tool:
yeah,
okay,
so
basically
the
bpf
tool
that
we
use
is
bcc,
but
bcc
memory
leak,
but
we
don't
okay.
There
is
a
way
to
install
it
directly
inside
the
verse
launcher
pod,
but
it
requires
a
lot
of
stuff
okay.
C
So
what
we
have
done
is
to
install
the
these
tools
directly
on
the
node
and
search
for
and
attach
it
on
the
on
the
process,
the
on
the
vert
launcher
process.
So
we
can
install
it
inside
the
the
first
launcher
pod,
but
I
think
that
the
pod
will
increase
on
the
on
the
requested
memory.
Okay-
and
I
don't
know
if
it
is
good
or
great.
A
Yeah
well
so
what
I
was
thinking
is,
I
mean
I
wouldn't
expect
to
have
to
use
the
tooling
for
much
like
unless
we
were
knew
there
was
a
memory
leak
or
if
we
wanted
to
investigate
it.
It's
mostly
like
like
so
I
would
expect
is
that
there
would
be
a
vert
launcher.
Images
like
that's
produced
today,
but
we
could
also
do
is
we
could
have?
I
mean
it
could
be
a
separate
build
any?
Maybe
it's
maybe
it's
shipped
with
releases.
Maybe
it's
not.
A
Maybe
it's
just
something
that
that
you
could
build
if
you
want
like
just
through
like
make
bazel
build
or
something
like
the
idea
is
that
maybe
we
could
have
a
way
to
include
the
package
in
there
just
so
that
you
won't
have
to
do
all
the
things
that
that
you
just
went
through
to
have
to
diagnose
this
yeah
yeah.
I
have
it
in
the
launcher.
Image.
C
Yes,
it
can
be
done,
it's
not
it's
not
quite
easy
because
there
are
or
other
there
are
things
to
do,
such
as
install
the
kernel
modules
and
some
other
stuffs.
But
yes,
it
can
be,
can
be,
can
be
done
absolutely.
I
think.
A
C
You
need
to
capability,
sysadmin
or
or
run
the
the
pod
directory
with
the
with
the
root
privileges.
C
So
yeah
it
you
need
this,
that
capabilities,
or
at
least
it
it
depends
on
which
kernel
version
are
you
using
because
bpf
for
kernel
for
a
more
more
recent
version,
has
a
specific
capabilities
for
itself.
C
But
yeah,
I
think
this
okay.
A
B
I
I
wonder
if
we,
it
would
be
beneficial
to
run
like
have
this
in
as
a
totally
separate
image,
regardless
of
the
the
the
launcher
image
so
like
debug
tools,
image
and
then
attach
debug
tools
as
a
sidecar
container
to
the
work
launcher
part
whenever,
whenever
what
launcher
is
not
behaving
properly
or
as
expected,
we
can
attach
the
sidecar
and
then
launch
the
process
launched
the
debug
2
process
on
on
the
word
launch
process.
B
C
Oh,
if
you
have
a
separate
container,
I
don't
know
if
you
can
go
inside
another
container
and
debug
it
such
as
root
with
root
privileges.
C
Or
so
maybe
what
we
can
do
is
or
to
have
a
tools
that
that
will
be
installed
directly
on
the
node,
or
at
least
we
can
create
a
true
version
of
the
first
launcher.
C
With
this
I
don't
know,
for
example,
separate
tags,
one
is
devil,
that
is
the
standard
one,
and
one
for
the
back
tool
and
yeah
build
the
two
different
images
with
the
the
first
one,
with
the
current
the
status
quo
of
the
of
the
worst
launcher
and
the
new
one,
the
the
back
tools
tag,
one.
For
example.
C
We
can
install
all
the
debug
tools
that
in
which,
or
at
least
all
the
debug
tools
that
we
that
we
need.
Basically,
I
I
didn't
investigate
it
on
what
other
debug
tools,
ebpf
tools
we
can
use,
but
I
saw
that
there
are
a
lot
of
a
lot
of
binary
tools
that
should
be
used.
But
I
I
don't
know
in
the
in
deeper
which
one.
A
Okay,
yeah,
I
mean,
I
think,
away
your
idea.
Maybe
it
could
work.
I
mean
we,
it's
just
a
matter
of
like
whether
we
have
the
right
capabilities
to
do
it.
The
side
car,
because
we
should
be
able
to
access
the
processes
that
we
need.
We
should
be
able
to
get
the
information
we
need.
It's
just
you
know
what
else.
What
else
do
we
need
to
change?
I
think
both
these
things
are.
A
It
sounds
like
they're
both
doable,
but
I
think
in
both
cases
we
may
need
to
do
something
with
with
the
caps
and
to
make
it
work
so
and
then
both
of
them
would
have
to
have
some
sort
of
like
build
pipeline
or
something
associated
with
them,
because
they'd
be
their
own
images.
B
Yeah,
okay,
so
I
I
had
a
question
as
in
was
that
segment
required
to
execute
those
bcc
tools
or
required
to
install
the
bcc2?
So
I
I
could
be
wrong
here,
but
from
my
understanding,
the
ebpf
tools
run
in
user
user
space
in
in
the
linux
kernel.
C
A
Okay,
well,
I
think
that's
something
we
can
think
about.
It
was
definitely
something
we
could
look
to
and
prevent
because
I
mean
I
think,
like
you
know
you
have,
we
have
like
some
metrics
for
for
the
stuff,
I'm
just
trying
to
think
of
other
ways.
We
could
other
ways
we
can
make
this
process
easier.
Other
ways
we
can
detect
this
and
when
we
do
our
performance
analysis,
all
right,
I
think
that's
this
is
this-
is
good.
Okay,
I
don't
have
any
more
agenda
items.
B
Nope,
I
don't.
A
Have
anything?
Okay,
all
right?
Well,
we're
gonna,
so
I've
got
our
stuff
for
follow-up
meetings
and
we're
gonna
continue
to
I'll.
Probably
next
time
we're
gonna
might
do
some
discussion
on
v1
again,
and
maybe
some
of
the
action
items
that
we
can.
We
can
look
at
and
maybe
we
can
scope
some
of
those
a
little
bit
more
okay.
Everyone.