►
From YouTube: SIG - Performance and scale 2022-09-29
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
All
right,
it's
a
sixth
scale.
Oh
let
me
start
my
screen.
This
is
sixth
scale.
Excuse
me
on
September
29th.
A
There
it
is
okay,
so
the.
A
Okay,
so
for
today,
just
I
think
so
we'll
do
we'll
do
a
review
of
the
periodic,
sharp
results
and
I
saw
a
few
failures
that
occurred.
We'll
just
see
if
it
was
just
a
flake.
A
Okay,
a
few
failures,
I
believe
I
did
look
at
one
or
two
of
these
and
it
seemed
yeah.
Okay,
it
seemed
like
there
was
just
something
that
basil
didn't
work:
okay,
yeah!
This
is
fine
that
that's
just
a
I
feel
like
it
looks
like
it
went
away.
Okay,
that's
good.
I'll.
Just
take
quickly
get
these
one
of
these.
A
A
Yeah,
okay
looks
about
the
same
okay
that
looks
good,
okay
and
then
Olay
I
saw
you
at
a
PR
opening.
Do
you
want
to
go
over
that?
One,
real,
quick.
A
B
So
there
are
a
couple
of
things
that
I
was
digging
into
monitoring
of
this
workflow
from,
like
usual
user,
issuing
a
delete
request
to
BMI
and
then
all
the
way
up
to
the
VMI
is
final
finalized.
So
there
are
like
two
steps
in
in
this
process.
One
is
when
the
delete
request
actually
comes
in.
It
comes
in
as
an
update
which
sets
the
deletion
timestamp
and
then
the
controller.
B
Looking
at
that
deletions
time,
timestamp
does
a
bunch
of
logic
to
bring
that
VMI
into
final
state,
so
either
failed
or
succeeded,
and
then
once
it
goes
to
failed
or
succeeded
State,
then
the
finalizer
is
removed.
A
bunch
of
work
is
done
like
removing
all
the
files
on
the
Node
and
stuff
like
that,
and
then
actually
the
object
is
released
from
HCT.
So
what
I?
B
What
I
noticed
is
that
there
are
metrics
available
that
that
tell
you
the
histogram
like
they
export
the
histogram
from
you,
delete
deletion
timestamp
all
the
way
to
final
state,
but
there
is
no
metric
for
what
happens
to
objects
from
the
final
state
to
actually
releasing
from
hcd
and
I.
Think
it
is
measurable
from
final
state
to
releasing
the
hcd
would
be
looking
for
that
final
State
and
then
looking
for
the
delete
event
in
the
informal
so
I
there
could
be
a
couple
of
metrics
interesting
there.
B
One
is
the
sheer
number
of
objects
that
are
lying
in
that
state,
where
they
are
not
being
finalized.
For
some
reason,
you
remember
like
we
had
a
race
bug
regarding
that,
where
some
objects
were
staying
in
the
failed
State
and
that
that
could
be
a
good
metric
and
then
other
metric
would
be
the
time
it
takes
to
go
from
final
state
to
releasing
the
object.
B
So
those
are
the
two
things
that
I
could
think
of,
but
yeah
all
of
this
was
when
I
was
digging
into
exporting
that
metrics
for
audit
logs
I
found
out
that
we
don't
actually
have
any
metrics
for
the
other
half
of
the
process.
Oh.
A
So,
okay,
so
we'll
will
so
the
other
half
of
the
process.
It's
out
since
it's
outside
of
Qbert,
it's
sort
of
a
question
here:
is
it
since
it's
outside
of
Qbert
is
it
the
question
is
like?
Is
it
should
we
track
it?
Because
at
this
point
in
time
you
know
we,
the
control
point
is
done
all
the
work
it
can
and
now
it's
it's
completely
up
to
kubernetes.
It's.
A
A
You
know
this
might
be
a
little
bit
outside
of
it
and
maybe
there's
already
a
metric
for
it
that
we
can
use,
but
in
terms
of
like
what
we
need,
I
think
I
think
this
will
do
like
I,
think
user
delete
so
the
time
the
user
issues
a
delete
that
first
uptake
call.
We
got
the
deletion
timestamp
all
the
way
until
the
finalizer
gets
removed.
I
think
is
like
that's,
that's
the
yes,
our
control
Point
delete
time.
Yeah.
B
Yeah,
so
the
the
current
PR.
Let
me
talk
a
little
bit
about
what
that
PR
does.
So
we
have
an
audit
tool
here
that
will
collect
all
the
metrics
in
that
audit
tool.
I
have
added
two
more
percentiles
that
is
99,
percentile,
95
and
50.,
and
or
like
so
in
total.
Six
six
more
data
points,
one
that
goes
from
deletion
time
stamp
to
three.
That
goes
from
deletion.
Time
stem
to
succeeded
and
then
three
that
goes
from
deletion.
B
Timestamp
two
failed:
okay,
currently
I
think
we
should
expect
the
failed
deletion
timestamp
to
failed
all
the
three
to
be
zero
because
we
are
not
introducing
any
failures.
B
All
of
our
vmis
are
successfully
garbage
collected
and
only
then
the
the
test
succeed.
So
for
a
successful
run,
it
should
be
zero,
but
I
haven't
in
I.
Have
it
in
there
in
case
if
we
want
to
add
a
test
that
introduces
a
failure
and
also
checks
for
a
failure,
path,
yeah.
So.
A
We
could
do
yeah,
it
makes
sense,
because
so
this
is
a
we
we
could
generate
this
by.
We
just
need
the
the
guests,
or
maybe
it's
the
launcher
process
to
exit.
One
is
all
we
need
to
do
and
then
it
will
be.
The
container
basically
just
needs
to
exit
one,
and
then
we
get
failed
State.
Otherwise
we
get
succeeded
so
yeah.
It
makes
sense
like
because
we're
gonna
get
both
these
cases.
Probably
this
one,
the
most
common
in
our
testing,
but
we
could
very
easily
generate
this
yeah.
It
makes
all
sense.
B
Yeah,
and
so
we
have
90,
99.95
and
50.
This
is
exactly
similar
to
what
we
have
from
creation
timestamp
to
running
state.
A
B
Okay,
really
quickly,
regarding
the
first
topic
on
whether
this
is
a
cube
or
cube,
kubernetes
or
cube,
word
thing:
I
actually
am
not
really
sure
whether
it's
a
kubernetes
thing
alone,
because
there
are
some
informers
that
Cube
word
has
written
like
informers
in
word,
tender
which
deals
with
cleaning
up
the
files
and
it
from
what
I
remember
it
is.
B
It
is
happening
after
the
like,
after
final
State
and
in
between
releasing
the
the
object
so
I,
let
me
try
to
dig
it
up
whether
there
is
some
action
or
some
kind
of
garbage
collection
that
is
happening
in
in
between
final
state,
in
releasing
the
object.
If
there
is
something
happening,
then
I
think
we
should
track
it
in
terms
of
metrics.
B
Because
the
bug
I
remember
which,
where
the
finalizer
was
where
objects
stayed
in
final
State,
as
in
the
object,
was
in
failed
state
but
finalizer
was
not
removed,
was
precisely
the
bug
where
race
was
happening
in
this
state.
So,
like
my
recollection,
is
little
bit
off
right
now.
I
need
to
go
back
and
dig
into
it,
but
I
I
feel
there
is
some
logic
that
is
executed.
A
That's
interesting
so
I
guess
like
see
when
I
was
my
understanding
was.
Is
that
the
cleanup
any
sort
of
like
the
cube
root?
Garbage
question
was
done
in
this
period?
You
know,
basically,
this
being
the
the
base,
the
final
State
beginning
final
State,
ending
unless
I
guess
the
only
exception
to
that
would
be.
If
something
needs
the
Pod
needed
to
be
removed
after
I
guess
because
I,
don't
know
why
I
wouldn't
or
I
guess,
maybe
it
has
to.
A
Maybe
the
guest
has
to
be
terminated,
so
it
doesn't
have
to
so
they're
asked
but
I
guess
no
one
here
like
I,
don't
know
I
guess
I
have
to
think
about
it,
because
I
guess,
if
we're
waiting
for
the
guests
to
terminate
because
we
don't
want
to
cause
any
damage
to
it
by
doing
any
garbage
collection
or
waiting
until
the
state
and
then
we're
doing
a
garage
collection,
so
I
guess
it
could
be
possible
and
I
guess
some
things
in
the
file
system.
Perhaps
like
some
things
like
in
like.
A
B
B
Canada,
yes,
exactly
the
Ghost
Records
are
are
the
ones.
So
what
I
remember
is
what
word:
Handler
has
a
bunch
of
files
on
on
the
Node
that
represent
a
particular
VMI,
so
it
does
like
it
does
a
watch
on
those
socket
files
in
order
to
look
for
events,
and
once
this
is
failed
or
succeeded,
then
it
will
go
ahead
and
delete
those
course
records
and
release
the
object
from
finalizer.
A
B
Describe
although
I
I'm
I
mean
I
still
need
to
go
refresh
my
memory
on
whether
the
workflow
I
am
talking
about
actually
happens
or
whether
that
bug
happened
because
something
in
between
the
failed
State
didn't
reconcile
properly.
So
that's
something
I
need
to
check,
but
yeah
I'll
do
it
this
week.
A
Okay,
yeah
yeah,
basically
yeah
see
there's
a
tricky
part
here.
It's
like
so
right.
This
is
the
end,
so
yeah
they're,
they're,
yeah
I,
see
how
there
is
because
it's
basically
we
would
declare
something,
as
you
know,
in
the
final
basically
exiting
the
final
State
and
the
next
thing
we're
supposed
to
do
is
remove
the
finalizer.
But
if
there's
any
garbage
collection,
that's
done
or
being
done.
That's
failing
and
we
fail
to
do
this,
then
what
what
state
are
we
in
we're?
Actually
we're
sort
of
in
between
here
yeah.
B
A
Know
yeah
I,
remember
right,
I,
remember:
we've
we've
run
into
this
book.
This
was
the
this
was
the
The
Ghost
Records
were
not
cleaned
up
and
essentially
what
happens?
Is
the
node
restarted
and
been
pissed
of
vmis
we're
just
sitting
around
and
get
restarted
yeah
because
we're
able
to
launch
any
more
of
bmis.
B
Finalizer
logic
was
waiting
for
was
not
able
to
read
that
file,
so
yeah,
a
bunch
of
things
went
wrong,
but
I
think
these
are
the
metrics
that
would
you
know,
bring
out
these
error
cases
like
in
in
the
PRS
itself,
so
yeah.
A
It
would
be
interesting
to
see
if,
like
if
we
don't
yeah
see
it's
tough,
the
the
removing
the
finalizer
step
is
like.
Maybe
you
can
check
when
so
I,
don't
remember
when,
when
the
met
the
delete
network
was
written,
I
I,
don't
remember
which,
if
it's
hangs
off
of
this
or
if
it
hangs
off
the
same
function,
that
does
this.
B
I
think
the
Matrix
hangs
off
the
state.
So
what
the
Matrix
does
is
it.
It
is
an
Informer
on
the
BMI
and
as
soon
as
it
sees
the
deletion
timestamp,
that's
the
old
time
it
will
use,
and
then
the
phase
transition
to
final
or
succeeded
is
the
new
time
it
will
use,
and
then
it
will
observe
the
difference.
Well,.
A
So
what
I
remember,
though,
is
like
is
that,
when
this
happens
like
that,
these
things
are
supposed
to
happen
like
very
close
to
one
another.
That's
like
like
they're,
almost
in
the
same
function,
call
that's
what
I
remember
right
and,
and
that's
where
it's
in
so
it's
I
can
look
it
up
real
quick,
because
it's
in
the
in
the
pr.
A
Yeah,
it's
like
it's
almost
like
it's
like
we're
talking
lines
of
code
like
like
apart.
Let's
see
here
various
transitions-
oh
okay,
so
yeah!
No
I,
don't
see
it.
So,
let's
see.
A
A
Yeah,
okay,
I,
think
we're
just
just
have
to
take
a
look
at
it
again
because
I,
don't
I,
just
don't,
remember
sure,
yeah
no
problem
but
I
what
I?
What
I
recall
is
like
it
was
like
the
same
function.
Call
basically
the
thing
that
processes
failed
and
succeeded
that
sets
it
into
like
trails
and
succeeded
was
like
in
the
same
function.
That
was
supposed
to
remove
the
final
answer,
but
there
might
be
some
steps
in
between
which
is
what
you're
getting
at,
which
is
this
and
and
if.
B
Makes
me
think
if
those
steps
are
happening
asynchronously
so
like
the
controller,
is
making
the
failure
to
remove
finalizer
like
immediately
and
then
the
word
Handler
is
also
removing
those
course
records
after
failure
succeeded
and
then
both
of
them
happening.
Asynchronously
like
that
could
be
one
workflow,
but
I'm
not
sure
I
need
to
go.
A
B
Then
I
have
another
small
update,
so
you
remember,
I
was
like
once
we
were
seeing
the
end
points
Matrix
being
reported
in
audit
tool
that
was
kind
of
little
bit
of
I.
Looked
at
the
the
test,
clients
and
the
test.
Clients
do
not
use
those
metrics,
so
those
metrics
are
purely
coming
off.
Word
controller
word,
Handler,
word
API
and
one
more
web
hook,
and
the
reason
was
that,
even
though
both
of
them
use
the
same
client,
you
need
to
like
import
from
it.
B
Monitoring,
client
Prometheus
package
and
that's
only
imported
in
those
four
packages.
So
I
posted
an
update
on
slack
asking
those
questions
and
I
I
think
I've
found
answers,
so
I
just
wanted
to
give
a
heads.
A
Up
so
this
is
the
one
where
you
said
like
where
we
were
talking
about
the
I
think
it
was
back
here
when
we
were
talking
about
what
what
were
the
clients
that
were
doing
these
is
that
what
it
was?
Yes.
B
Correct
so
for
once,
I
thought
that
the
end-to-end
test
clients
in
the
the
clients
used
in
the
in
the
actual
API
server,
all
of
or
well
even
controller
Android.
All
of
those
share
the
same
logic
of
monitoring
the
metrics
by
intercepting
the
rest
call,
but
those
metrics
are
only
enabled
for
for
the
four
packages
that
I
mentioned,
which
are
on
the
server
side.
Those
are
not
enabled
for
end-to-end
tests
or
any
other
packages.
A
B
It
does
not
that's
what
I
was
trying
to
say
that
those
questions
were
so.
There
was
an
open
question
whether
it
does
or
not
does
not
and
I'm
saying
that
it
does
not,
though
these
metrics
are
only
coming
from
four
four
components:
I
created
word,
API,
word
controller
word
Handler
and
the
web
hook.
A
All
right
that
makes
sense
cool
all
right.
Thanks
for
the
updates
on
on
that
sort
of
client
s,
okay,
cool
all
right,
I,
don't
have
anything
else,
hey
Andre
I
saw
you
joined
I,
don't
know
if
you've
got
anything
else,
you
want
to
add.
C
I
just
I
tried
to
reach
you,
because
I
would
would
like
to
tell
you
that
we
are
releasing
ddesk.
Finally,.
C
That's
exciting
yeah.
The
only
issue
that
I
would
like
to
talk
to
you
is
that
is
without
gpus.
For
now.
Let
me
explain
you
why
let
me
put
the
link
here
on
the
chat,
for
you
know.
C
Gcp
doesn't
allow
on
the
abuse
to
enable
vtd
iomu
equals
true
on
the
Kernel.
Doesn't
work.
C
C
Very
all
the
games,
don't
don't
work
and
many
things.
Autocad,
don't
gonna
work,
but
we
need
to
start
with
something.
Okay,
we
cannot
wait
anymore.
Okay,.
A
C
A
I
totally
understand
yeah,
okay
I
mean
were
they
so
what's
I
guess
sounds
like
they're
working
on
it.
C
Any
suggestions
we
try
Amazon
why
we
are
not
in
Amazon
or
Azure,
is
because
they
they
charge
for
the
traffic
between.
C
Regions-
and
this
is
too
expensive
for
us,
because
all
the
traffic,
the
Pfizer
store
on
the
US
and
the
users
are
logging,
for
instance,
with
his
desktop
in
Australia,
we
threw
the
pipes
of
Google,
we
accessed
the
files
inside
the
US
storage
to
become
governance,
compliance
we
understand
and
there
are
lots
of
paints.
A
I
I'm
not
sure
I
I'm,
not
sure
under
it,
I
don't
know,
I'd
have
to
think
about
it.
A
little
bit
who.
A
A
I,
don't
have
a
list:
I
don't
have
a
list
off
the
top
of
my
head.
Okay,
but.
C
Anyway,
we
have
Nvidia
G4S
on
the
lab
works
perfectly.
A
A
I
think
what
would
help
me
Andre
if
you
I
I,
because
you've
talked
about
like
your
use
case
all
right.
Would
you
rather
send
me
an
email
with
like
a
more
detail
about
your
use
case
and
the
requirements
that
would
help
me
I.
Think
further
the
conversation
to
like
understand
what
you're
exactly
what
you're
looking
for
like
because
you're
talking
about
like
like
what
were
the
requirements
you
had
on
on
using
GCE?
And
you
know
what
were
your
requirements
on
AWS?
You
know
whatever
any
gaps,
and
things
like
that
that
would
that
would
help
yeah.
C
I'm
gonna
write
it
down
for
you,
okay,
anyway.
Wonderful
job
are
you
working.
I
saw
my
technical
team
are
seeing
the
performacy
parts
that
you
are
in
charge.
This
is
amazing
job.
Please.
A
C
A
A
C
A
Yeah
just
send
me
an
email
with
like
what
I
asked
and
we
can
I'll
try
and
figure
something
out:
yeah,
wonderful,.
A
Sounds
good
cool,
oh
Andre,
thanks
for
the
thanks
for
the
info
and
congratulations
again,
that's
that's
exciting
yeah!
It's
always
a
it's
always
hard
rocking
to
get
to
get
the
product
release
out
the
door
and
yeah.
That's
cool!
Happy
to
hear
it's
pretty
awesome.