►
From YouTube: SIG - Performance and scale 2022-04-14
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
A
A
Four
four
five
weeks
or
so
so
I
haven't
so
probably
the
only
update
I
have
for
like
a
few
things
I
want
to
do
is
like
I
want
to
go
through,
because
we
haven't
looked
at
the
periodic
job
results.
I
just
wanted
to
sync
up
on
that
again,
let's
see
how
you
know
if
anything's
changed
here,
I
did
look
at
this
earlier,
just
to
kind
of
get
an
idea.
A
Everything
seemed
okay,
we're
still
roughly
in
the
same
amount
of
time
for
these
that
we're
spending
in
phases
and
the
number
of
updates.
Our
number
of
api
calls
look
roughly
the
same.
A
I
was
gonna
go
to
that
one.
Next:
okay,
pretty
good
three,
thirty
thirty
nine
yeah
well
under
this
still
looking
good
there,
okay
and
then,
let's
see
performance
cluster.
Is
this
one
right.
A
Oh,
this
is
oh
one
of
the
one
of
my
patches
must
have
emerged.
Then.
B
A
B
A
B
A
A
B
I
cannot
approve
things
yeah.
A
Let
me
get
some
gtm
okay,
let
me
get.
I
don't
have
to
get
david
or
roman
to
just
take
a
quick
look
at
it,
but
that
yeah
that's
what
we
need.
That's
what's
missing
for
okay,.
A
A
Yeah,
okay!
Well,
I'm
good
that
this
okay,
I'm
glad
that
merged,
so
that-
and
it
looks
like
everything
from
the
syntax
perspective-
is
working
as
expected.
We
just
assumed
that
that
one
patch,
so
okay.
B
B
B
B
Okay,
yeah,
the
the
thing
is:
do
you
see
that
there
is
like
some
big
variation
in
the
previous
test.
A
B
B
Maybe
it's
something
with
the
cluster.
Isn't
it?
You
know
like
the
the
cluster
was
busy.
Something
happened
to
this
nodes,
but
it's
a
bare
metal
node.
It
shouldn't
have
any
difference.
So
is
this.
A
So
when
I
look
at
this
okay,
so
we've
got.
A
A
Transitioning
from
scheduling
to
actually
it's
scheduling
phase
is
what
we
see
well,
actually,
no
new
schedule
as
well,
but
it's
we.
This
is
one
pattern
we
do
see
with
like
yeah,
so
you
can
see
like
the
the
dark
blues
like
you
can
see
the
large
large
amount
of
times
they're
spent
there.
The
light.
A
Oh
sure,
yeah,
okay,
this
well,
I
I
have
a
different
point.
I
want
to
make
about
this,
but
one
of
the
things
that
we
see
internally
when
we
look
at
our
clusters
and
their
performance
when
this
light
blue
starts
to
increase,
is
when
we
run
into
problems
it's
actually.
This
is
when
we
see
kubernetes
kubernetes,
actually
running
up
into
issues
with
performance,
and
when
we
see
this
light
blue,
I
can't
really
say
about
the
dark,
blue.
I
think
it's
a
mix,
we
actually
seven.
A
A
This
is
almost
exclusively
kubernetes,
so
like
on
this
on
yours
light
blue
yeah,
you
have
the
same,
looks
like
you're.
A
A
So
this
is,
what
about
do
you
don't
have
pending
in
here.
B
Yeah,
for
some
reason
depending
doesn't
appear
to
me
so
so
this.
A
Is
so
this
would
be
if
I
interpret
this
correctly,
this
is
the
time
it
takes
from
pending
or
from
scheduling
to
scheduled
right.
So
this
is
scheduling
time
or
is
this
time
from
pending
to
scheduling.
B
A
Yeah
yeah
but
you're,
using
you're,
using
creation
to
so
using
the
creation
time
stamp
to
the
scheduled
timestamp
right.
Okay,
that's
interesting,
and
then
you
put
them
together.
I
see
so.
A
Is
the
time
in
that
we
spent
in
scheduling,
though
I
think
because
it's
the
time
we
train
yeah,
so
this
is,
I
think
it's.
The
reason
why
you
don't
see
pending
is
because
this
is,
I
think,
the
light
blue
is
your
pending.
I
think
your
green
is
actually
scheduling
like
time
spent
scheduling,
and
this
is
yours.
B
B
A
I
would
we
should
double
check
this,
because
I
I
think
I
I
just
kind
of
a
feeling
that
this
I'm
wondering
if
this
is
actually
this
time
here
is
actually
it's
actually
this
the
same
as
this
light
blue.
A
That's
what
I'm
wondering
I
don't
know
I
kind
of
want
to
see
the
yeah
what
you
have
on
the
because
you
said
it
from
time
that
the
creation
time
stamp
to
like,
because
if
you
do
the
creation
time
stamp
to
do
the
scheduled
transition
that
would
be
this
could
be
this.
The
phase
transition
time
stamp
from
scheduled
right
is
like
the
moment
we
transitioned
into
schedule
right
yeah,
so
all
the
time
beforehand
would
be
10
would
be.
Creation
pending
scheduling
would
be
this
yellow.
B
A
A
B
Current
creation
from
the
object
was
created
when
the
object
first
appear
the
request,
you
know
and
then
the
phase
transitions
but
depending
does
appear
for
me,
never
appeared.
So
I
don't
know.
That's
weird.
B
A
B
Scheduling
time
because
there
is
a
metric
from
the
scheduler,
that
is
the
scheduling
time,
it's
the
kubernetes,
metrics.
Okay,
it
you
probably
your
dashboard,
if
you
don't
have,
might
be
interested
to
check,
because
what
I'm
saying
here
is
okay,
I'm
not
sure,
but
I
think
that
the
scheduling
time
it's
very
small
here-
then
we
don't
see
any
pending,
but
we
see
the
convert.
You
know
components
is
doing
things.
You
know
some
slow
down
here.
A
Be
interesting
to
see
what
you
find
in
the
prometheus
data
and
just
yeah.
I
find
that
weird
I
mean
because
they
definitely
have
the
time
stamps
I
mean
they've
got
to
right
like
they
should.
They
should
definitely
have
it.
So
it's
kind
of
weird,
maybe
there's
we
have
a
bug
there
or
something
it's
interesting:
okay,
well
yeah,
but
you're
right
anyway.
Back
to
your
earlier
point,
your
first
point
was
that
right,
this
is
we're.
Seeing
a
big
increase
right
like
here
were.
A
B
Oh,
this
is
200
if
you
go
back
a
little
bit.
Sorry
yeah,
so
you
see
like
there
is
a
stack
here
that
is
very
small
between
these
two
big
ones,
yeah
this
one.
This
is
the
100.
A
A
We
don't
even
reach
seconds,
we
have
it
in.
We
have
we
take
it
takes
less
like
for
pending
and
scheduling
phases.
It's
in
milliseconds
like
we
don't
even
reach
this,
like
it's
less
than
a
second.
It
takes
and
you're,
and
that's
in
the
95th
percentile
you're
in
the
90
percent
on
you
have
almost
20
seconds.
A
Yeah,
I
guess
it's
not
fair
to
say
I
haven't
done
so.
Let
me
do
I'll
have
to
do
a
comparison
on
the
hardware
like
because
I
haven't,
I
haven't
done
the
exact
same
test,
but
roughly
like
what
I'm
saying
is
that
when
we
have
we're
doing
our
creations
of
I
don't
know,
we
do
like
maybe
a
bunch
at
a
time,
a
handful
of
time,
it's
less
than
100,
but
it's
it's
it's
in
the
middle
seconds.
It
takes
for
a
lot
of
these
phases.
B
A
Yeah,
this
is
interesting.
I
mean
this
is
so
what's
this
one,
this
is
600.
A
B
B
A
It
in
some
cases
yeah-
I
mean
I
guess
regardless
though
I
mean,
is
this.
What
we
expect,
though,
like
I
mean
I
guess
I
mean
if
we
can
produce
it
and
you
can
produce
it,
I
mean
that's,
that's
good,
but
I
mean
I
wanted
to
be
good
to
do
some
analysis.
That's
like
an
eight
like
what
is
happening
that
we're
sitting
like
scheduled
phase
right
where
what's
happening
at
schedule
space.
We
are
transitioning
from
the
vm,
the
vmi,
to
the
to
the
vert
handler.
B
The
tracing
part
should
be
useful,
and
then
we
saw
that
there
is
this
guy
that
wants
to
work
on
the
tracing.
Maybe
we
should
come
up
with
a
plan,
maybe
give
him
you
know,
because
if
we
know
someone
wants
to
work
on
that,
it
will
be
very
helpful,
especially
to
get
more
attraction.
You
know
to
our
community
and
you
already
started
something
with
the
tracing,
but
you
know
we
we,
I
think.
Maybe
we
can
point
him
before
going
to
open
tracings
kind
of
things.
B
You
know
to
have
more
tracing
points,
analyze,
the
logs
and-
and
you
know
I
I
don't
know
we
can
just
tell
him-
I
don't
know
how
advanced
it
who
will
make
this,
but
we
can
yeah.
A
Definitely
if
we
can
do
if
we're
able
to
open
there's
nothing
great,
I
think
that's
a
that's,
definitely
definitely
a
big
effort,
but
if
he
is
open
to
doing
that,
work
it'd
be
awesome,
but
if
not
like
yeah,
we
can
do
like
the
the
poor
hands
tracing,
which
is
the
the
tracing
that
I
did
in
her
controller,
which
we
could
add
it
to
the
handler.
A
It
would
be
interesting
to
see
some
of
the
cases
where
you
know
where,
with
these
vms,
to
see
if
we
like
actually
hit
a
slow
something
slow,
it
needs
to
be
like
a
lot
of
research.
We
need
to
do
in
terms
of
like
what
are
the
paths
that
we
need
to
look
out
for
yeah.
So
there's
still
some
there's
some
good
work
that
we
need
to
do
there.
A
Okay,
all
right!
Let
me
go
to
the
next.
This
is
something
I
saw
that
I
thought
was
interesting
that
I
just
wanted
to
mention,
so
we
were
seeing
this
in
one
of
our
data
centers.
Recently,
it's
a
large
data
center,
it's
like
over
700
nodes
and
we
would
see
periods
of
low
churn
and
high
vmi
counts.
One
of
the
vert
controllers
creates
a
ton
of
patch
requests
like
in
a
crazed
out.
A
The
numbers
out-
I
don't
so
I
I
hadn't
figured
out,
but
it
it's
patching
like
crazy,
but
what's
interesting
about
this
pattern,
is
that
you
can
see
that
here
it's
patching
this
green
line
and
there
are
right,
it
falls
pretty
quickly
and
as
it
falls,
it
actually
corresponds
with
with
this.
A
This
grafana
board
that
like
so
you
can
see
so
this
right
here
this
period,
where
we
have
this
green
line.
This
is
the
equivalent
of
the
high
patch
counts
and
when
it
falls
so
when
this
this
this
green
line,
the
rest
client
request
falls
the
others
rise
over
api
over
handler
those
increase,
and
you
can
see
in
the
phase
transition
times.
There's
a
change,
so
this
line
the
scheduling
time
increases.
We
have
this.
Our
signature
looks
like
this.
A
We
have
a
high
amount
of
requests
from
bird
api
controller
handler
fairly.
I
mean
they
did.
B
A
B
A
Periods
yeah
like
during
the
spirit,
so
that's
that
I
mean
that
that
could
be
true,
but
it's
I
just
find
it
weird
like
so
this
this
area,
so
I
said
well,
this
is
it's
low
turn
during
this
year.
That's
the
other
part
of
this.
Is
that
so
this
these
areas
right
here,
there's
like
high
scheduling
times,
represents
high
return
things
just
take
longer
in
kubernetes.
That's
just
what
happens
and
we
are
still
creating
vms
during
this
time.
So
that's
true,
but
we're
not
creating
as
many
vms.
A
Yeah
during
this
time,
it's
it
remains
fairly,
steady,
like
within
a
few
hundred,
so
it's
not
like
increasing
or
decreasing,
really
quickly,
it's
remaining
fairly
steady
and,
for
some
reason,
the
the
arrest
client
request
is
extremely
high,
and
this
is
what
I
found
when
I
dug
deeper.
Is
that
it's
it's
just
patching
and
it's
only
one
of
the
vert
controllers,
it's
not
all
of
them.
One
of
the
controllers
was
just
patching
away.
No.
B
B
No
kubernetes
it's!
This
is
a
kubernetes
pro,
so
something
that
we
were
also
discussed
internally
in
ibm.
But
you
know,
kubernetes
controllers
works
like
that.
It's
only
single
instance
by
default,
and
it
doesn't,
you
know
to
have
multiple
controllers.
It
will
need
to
charge.
You
know
the
data
across
different
controls.
It
gets
complicated
well.
A
That's:
okay,
that's
that's
fine,
but
like
what
I
what
it
doesn't
make
any
sense
to
me.
Is
that
like?
Why
would
why
would
they
request?
Why
would
they
be?
Why
would
we
be
patching
a
ton?
Almost
it's
not
really
idle
time,
but
at
like
low
at
low
turn
at
when
when
you'd
expect
not
a
lot
of
vms
and
then
why
would
be?
Why
would
we
decrease
the
number
of
requests.
B
This
phase
transition.
Actually
it's
telling
the
performance,
it's
the
latency,
not
how
many
vm
is
being
created
so
because
what
is
what
I'm
thinking
is?
Okay,
so
imagine
I
don't
know
just
guessing
here.
Imagine
a
scenario
that
it
you
know
you
you
need
to
create.
One
thousand
gems
and
the
system
you
know
cannot
cannot
cope
with
that,
because
it's
especially
because
it's
busy
we
can
see
here
that
things
are
very
slow,
but
suddenly
you
know
their
requests.
B
The
client
requests
decrees.
But
here
you
are
still
having
like
a
lot
of
pending.
You
know,
requests
in
the
queue
and
then,
when
the
system
you
know
becomes
a
little
bit,
you
know
less
overloaded.
It
can
now.
You
know
it
can
process
all
these
requests
that
are
pending.
You
know
and
then
you
see
dispersed.
B
I
don't
know
just
guessing.
You
know
something
with
you
had
some
requests
pending
and
now
you
will
need
to
process
it
unless
it
unless,
if
you
are
a
senior
request,
because
you
should
only
see
any
new
requests
and
then
it
gets
higher
this,
I'm
just
thinking
that
maybe
it's
something
that
it's
on
the
queue
you
know
that's
now.
It's
been
processed
now.
B
B
Also,
if
you
can
get
like
this
number
of
great
requests,
you
know
the
call
the
rest
call.
B
You
know,
because
you
have
here,
you
know
aggregated
divert
controller
root
api,
but
if
you
can
just
get
this,
for
example,
create
you
know
just
to
make
again
to
make
sure
that
maybe
you
see
or
delete
so
you
just
check
what's
happening,
because
maybe
it's
deleting
that
you
see
a
lot
of
you
know
this
high
request
here.
B
A
A
A
It's
literally
we're
just
during
these
periods
during
this
whole
for
this
whole
period
of
time
we're
creating
and
deleting
during
the
periods
that
you
can
see
where
there's
that
little
higher
scheduling
time
it's
when
we're,
creating
and
deleting
and
we're
creating
more
we're,
getting
a
lot
of
press,
because
there's
a
lot
of
creates
and
deletes
here,
there's
very
few,
so
it's
kind
of
like
a
there
are
still
creating.
What's
going
on,
as
we
can
see,
there's
lines,
there's
data
being
populated.
A
It's
just
strange
that,
like
it's
there's
the
signature
matches
that
there
is
like
a
period
when
we're
at
a
low
turn
for
some
reason,
the
patched
then
tax
requests
shoot
through
the
roof
in
the
vert
controller,
and
everything
else
is
not
doing
any
work
or
not
much
at
all
like
it's.
It's
still
doing
work
just
on
the
whole
lot.
You
know
very
little,
but
for
some
reason
this
is
doing
a
lot
of
work
and
then
and
then,
when
we're
back
doing,
you
know
more
work,
the
api,
the
word
api
you
can
see.
A
It
comes
back
to
life.
Quite
a
bit.
Bird
handler
comes
back
to
life
quite
a
bit
and
then
controller
dives
back
down,
which
is
a
little
bizarre
like
I
would
expect
for
a
controller,
maybe
to
go
up
right
instead
of
come
down,
so
I
just
find
it
a
little
weird
that
we're
like
what
is
it?
What
are
these
patch
requests
that
were
that
are
happening?
That's
what's
a
little
bizarre,
so
I
don't
know
I
I
don't
know
what
it
is.
A
I'm
the
reason
I'm
bringing
it
up
is
because
if
we
something
to
look
out
for
because
something
to
do
a
little
more
research
on,
because
I
just
find
it
if
we're
viewing
a
ton
of
patch
requests
here-
maybe
we're
like
you
know-
maybe
we're
just
doing-
maybe
we're
updating
something
too
often
that
maybe
we
have
a
code
path,
that's
constantly
updating
or
making
patches
or
whatever
changing
bmi's
doing
something,
and
that
isn't
that
isn't
activated
or
isn't
running
when
we're
when
we're
creating
a
lot
of
vms
or
something
I
don't
know,
it's
a
little
weird.
B
A
A
If
I
can
reproduce
this
exactly
in
the
steady
state
job,
then
it
would
be
interesting
to
have
you
do
it
in
your
data
center
and
see
if
you
can
get
the
same
thing
or
to
see
if
it's
something
just
on
land
or
if
this
is
something
that
there's
some
there's
a
problem
somewhere,
but
anyway
I
figured
I
mentioned
to
keep
an
eye
on,
because
I
just
this
is
counterintuitive.
This
does
not
look
quite
right.
B
A
Yeah,
it's
a
little
weird
yeah.
Okay!
Well
I'll
leave
it
here.
I
can,
I
don't,
have
it
I'm
going
to
make
an
issue,
I'm
going
to
do
I'll.
Do
a
little
like,
I
said,
a
little
bit
more
investigation.
Maybe
one
try
to
reproduce
with
the
steady
state
job
and
see
if
and
then
create
an
issue
out
of,
then
we
can
kind
of
we
can
go
from
there.
It
would
be
really
cool
if
I
could
actually
reproduce
this
and
if
I'm
able
to
do
this,
so
this
is
a
job.
A
Do
we
have
to
reduce
this
in
one
of
our
one
of
our
our
jobs,
the
periodic
that
would
be
really
cool
to
see.
B
A
Okay,
all
right
and
then
last
thing
is
prs.
We
still
have
this
open,
so
I'm
I
need
to
know.
B
It's
a
test
that
you
create
vms
and
you
live
there.
It's
what
you
have
in
your
cluster
isn't
so
you
have
like.
Maybe
an
old
vms
know
that
being
created
there
and
it's
there
forever
and
and
then
it's
kind
of
the
stability
of
the
cluster.
You
know,
then
I
don't
know
if
this
behavior
is
related
to
that.
But
it's
maybe
you
know
that's
why
we
don't
see
because
normally
the
test
that
we
do
it's
we
create
see
things
and
destroy
everything.
B
A
That
would
be
we.
We
should
be
able
to
do
that
with
like
a
little
bit
of
tweaking
to
study,
say
test
right.
I
think
maybe
just
that
would
be
a
good
one
to
add,
like
kind
of
another
offshoot
of
it
where
we,
because
it's
really
what
this
is.
It's
steady
state
but
like
kind
of
like
you
said
where
they
run
a
little
bit
longer
like
it's
like
we'll,
we'll
let
the
vm,
maybe
let
the
vms
run
for
a
certain
amount
of
time.
You
know
hours
instead
of
just
minutes.
A
Let's
see
what
see
what
happens,
yeah
kind
of
like
ability
test
some.
You
know
some
little
offshoot
or
a
little
leg
of
steady
state.
We
can
do
with
the
burst
test
as
well.
I
mean
same
concept.
Just
kind
of
you
know
burst
is
going
to
leave
them
around.
You
know.
A
A
So,
okay,
yeah,
so
I'll
do
some
follow-up
and
see
what
I
can
find-
and
I
tweak
this
a
little
more
and
I'll
see
if
I
get
a
little
more
data
on
on
the
on
the
patch
request,
something
what's
what's
being
patched,
because
that's
that
wasn't
really
clear
to
me.
I
didn't
have
time
to
fully
dig
into
it.
So
yeah
one
of
this
one
I
find
I'll,
create
an
issue.
I.
B
Don't
know
which
cluster
do
you
have,
but
you
can
probably
enable
you
know,
increase
the
log
verbosity
of
the
virtual
controller
area
and
if
it's
like,
I
think
it's
the
verbosity
higher
than
three
or
five.
You
can
see
the
requests,
I'm
sure.
So
maybe
I
don't
know
you
know
you
can
check
it,
how
the
log
it's
implemented,
but
yeah
and.
A
B
You
can
see,
like
you
know,
more
specific
details
but
they've
the
the
api
for
sure.
If
the
api,
you
know
the
kube
api,
the
different
apis,
I'm
sorry,
the
cool
google
api.
A
A
It's
I
I'm
not
to
this
extent
like
it's.
It
was
happening
fairly,
consistent,
consistently
like
in
maybe
like
a
almost
like
a
week
ago,
but
I
haven't
seen
it
since,
like
I
haven't
seen
it
in
like
the
last
few
days.
A
Yeah,
I
don't
know
we'll
see
if
I
can
find,
but
I
don't
know
this
one.
I
figured
I
took
a
few
pictures
because
it's
a
little
weird
so
anyway,
okay
enough
enough
on
that
one,
so
the
okay
so
prs,
these
are
the
three
pr's.
So
we
already
talked
about
this
one
need
another
plus
one
same
with
this,
I
mean
I
think
that
we've
already
talked
about
this
a
million
times
and
then
I
think
this
one
merged
right.
This
was
the
one
that
merged.
B
A
A
B
B
Yeah
convert
community,
and
I
also
think
so
I
think
kubernetes
slows
is
also
under
convert
coming
in.
You
know,
good
viewer
community
and
then
it
has
a
directory
of
this
is
cage
6k,
something
like
that
and
then
inside
that
it
has.
I
think
it
makes
sense,
because
since
kubernetes
is
doing
that,
but
I
don't
know
I
don't
have
a
strong
feeling
of
that.
Maybe
we
can
ask
you
know
david
and
roman
about
that.
A
A
For
a
few
weeks,
so
okay
I'll
check
this
out
and
talk
to
jean
and
roman,
I
gotta
talk
to
them
anyway
for
these.
For
for
this
one,
so
cool,
okay,
all
right,
I
don't
think
we
have
anything
else.
I
think
for
next
next
time,
we'll
see
if
we
can
grab
what's
his
name,
is
it
said
kim
from?
I
only
know
his
irc
name?
A
A
A
B
Yeah,
maybe
we
can
point
him,
you
know
if
we
point
him
some
kubernetes
code.
You
know
to
just
show
to
show
how
kubernetes
is
implementing
the
tracing.
You
know
and
and
ask
him,
can
you
prepare,
you
know
some
some
design
ideas?
B
A
He
so
he
he
liked
the
comment
that
I
put,
which
is
that,
like
we
have
him
post
his
comments
in
sixth
scale
and
and
then
we
can
follow
them
on
slack
so
maybe
like
maybe
I
can
tag
him
on
here
and
and
we
can
just
start
the
conversation
that
way.
So
it's
like
why
jake,
I
don't
know
his
email.
Do
you
know
my
information?
Does
he
have
any?
No,
he
doesn't
okay
yeah.
A
B
A
We'll
go,
we
can
try.
We
know
it's
fine,
yeah,
exactly
yeah,
we'll
start
with
slacking
just
to
see
what
like
you're
good
at
interest
on
things
and
for
retracing
okay
cool
all
right
guys.
I
think
that's
a
laugh
thanks
for
attending.