►
From YouTube: 2023-03-30 Kubernetes SIG Scalability Meeting
Description
Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?usp=sharing
A
Okay,
so
this
is
six
collability
meeting,
30th,
March,
2023
and
okay.
I
see
that
point
is
starting
with
with
some
topics
to
discuss.
B
Sure
yeah
I
was
just
typing,
so
so
yeah
I
think
we
we
need
to
fill
in.
The
annual
report
for
six
scalability
for,
like
2022.
B
I,
started
doing
that,
but
I
didn't
do
anything
creative
at
this
point
like
I,
mostly
filled
in
sections
that
were
kind
of
obvious.
So
if
anyone
wants
to
help
contributing
either
by
commenting
or
by
forking
it
or
whatever
like
that
would
be
great.
A
B
A
B
Yeah
sure
I'm
I'm
planning
to
work
on
that
like
early
next
week
or
something
like
that.
I
definitely
won't
have
time
tomorrow
or
today,
so
yeah,
but,
let's
think
offline
like
Siam.
If
you
want
to
add
anything
or
anyone
else.
If
who,
if
someone
wants
to
add
to
to
annual
report,
that
we
need
to
create
in
the
next
couple
days,
and
that
would
be
great
I
link
the
VR
and
the
work
in
progress
PR
in
in
the
notes.
C
B
B
A
Okay,
do
we
have
any
topics.
C
There's
a
bunch
of
sick
testing
folks
joining
the
call
today
so
welcome
everyone,
I
guess
this
says,
based
on
the
discussion
we
had
last
six
kill
so
essentially,
AWS
is
now
it's
kind
of
a
gate
to
invest
long
term
in
scale
testing.
For
that's
part
of
that.
C
There's
an
issue
I
cut
to
kick
off
some
of
the
discussions
and
Ben
and
Justin
here
on
the
call
they've
been
pretty
active
with
it
and
helping
out
with
said
kicking
off
kickings
off
with
a
like
set
up
a
test,
job
and
I.
Think
in
this
meeting
we
wanted
to
discuss
about
some
some
of
some
questions.
I
think
we
had
on
how
to
configure
these
tests
and
like
where
do
we
actually
want
to
get
to.
D
E
Don't
know
I
can
also
hear
from,
and
you
know,
Kate
semper
will
be
pretty
involved
in
terms
of
like
the
account
management
and
and
keeping
an
eye
on
the
bills
and
that
sort
of
thing.
C
Okay,
so
I
think
I
guess
the
status
Justin.
Do
you
want
to
give
the
status
of
the
job
that
you've
added
we
can
discuss
where
to
take
it
to
next?
Absolutely.
F
Sure
so
we
also
have
cyprian
here
by
the
way
from
from
chaops.
The
job
is
very,
very
basic
and
we
have
we're
able
to
spin
up
jobs
in
the
existing
brow
cluster
on
AWS.
Obviously,
we
created
a
job
that
just
brings
up
a
cluster
and
runs
the
conformance
tests
with
100
notes.
F
I
know
hundred
is
a
long
way
from
5000,
but
it
is
somewhere
to
start
that
that
one
basically
runs
without
a
problem,
although
there
is
an
asterisk
on
that
and
that
we
have
to
use
Calico
and
that's
sort
of
interesting.
Currently
whether
we
haven't
tried
the
VPC,
cni
I
think
it
would
be
great
to
get
that
job
running
in
the
club
in
the
billing
account,
at
least
that
we
want
it
to
be
running
in
and
start
cranking
up
up
the
hundred
higher
and
higher.
F
F
C
You
should
probably
do
that
before
we
bump
up
the
scale,
so
yeah
I
think
I
guess
so
the
cni
is
one
it
seems
like.
So
what
do
the
other
cops
jobs
on
AWS
use
today?
Just
do
you
know,
is
it?
Is
it
Calico
or.
F
Caps,
just
everything
would
be
the
short
answer.
We
don't
actually
test
everything
but
like
we
try
to
test
every
combination,
so
the
ones
which
are
best
supported
on
AWS
are
calico
psyllium,
whatever
we
call
cubenet
today
and
it
supports
AWS,
vpcc
and
I-
is
that
right,
cyprian
can't
think
of
any
others
that
are
used
on
AWS.
D
No,
this,
these
are
pretty
much
the
best
supported
the
cnis
one.
Other
thing
that
I
could
think
before,
starting
to
crank
things
up
is
to
increase
the
limits
on
the
account.
F
There's
also
one
more
cni
as
well,
which
is
IPv6,
which
is
no
cni
but
I
right,
I,
don't
know
we
might
want
to
do
that.
So
that's
another,
maybe
option
but
I
think
today,
we've
only
tested
it
with
Cadillac
or
sun
cni,
but
in
theory
we
might
not
need
a
cni
at
all.
If
we
have,
if
we,
if
we
use
IPv6
mode
we're.
E
Some
reasons
we'll
want
a
different
set
of
accounts
because
for
tracking
so
like,
for
example,
on
gcp,
that's
the
same
thing:
it's
not
just
about
increasing
the
quota.
We
don't
want
to
increase
the
quota
on
all
of
the
accounts,
and
we
know
that
this
is
a
particularly
large
bill.
So
it's
much
easier
to
keep
track
of
how
much
like
goes
to
scale
specifically
by
having
a
different
project
pool
and
then
also
like.
E
If
we
wind
up
needing
a
lot
more
accounts
for
other
kinds
of
tests,
we
don't
want
to
have
to
increase
the
quota
on
all
of
those.
So
that
way
this
like
because
I
think
it'll
it
sounds
like
it'll,
be
the
same,
like
there'll,
be
some
manual
effort
to
bump
the
quota
on
the
accounts.
E
So
if
we
can
keep
that
targeted
to
just
the
like,
however
many
accounts,
we
think
we
need
for
the
the
Bosco's
pool
for
scale
testing,
that's
a
much
easier
problem
than
doing
it
for
like
all
AWS
e
to
e
accounts.
C
Okay,
so
just
to
confirm
these
accounts
are
already.
E
Need
to
create
some,
the
100
node
is
just
running
in
like
one
of
the
accounts
we've
been
using
for
like
chaos
and
cap,
a
CI
today.
G
H
We
are
full
of
sexual
entire
organization
where
we
can
create
this
account
and
bomb.
The
quro
I
think.
That's
not
really
an
issue.
I
think
the
main
blocker
is
trying
to
tie
up
everything
between
address
and
gcp,
because
it's
a
completely
new
process
for
us
like
be
able
to
integrate
all
those
icon.
The
existing
Pro
diploma
and
that's
the
bigger
problem,
but
create
the
account
itself
and
set
up
a
pool
of
account
is
not
really
an
issue.
E
I
think
the
other
complication
is
currently
we'll,
be
we
probably
if
we
want
to
go
quickly,
we'll
just
put
them
in
a
pool
in
the
like
existing
CI
clusters,
but
like
long
term,
it
makes
more
sense
to
we're
also
in
the
process
and
suggesting
of
setting
up
an
eks
cluster
to
to
like
actually
execute
test
workloads
from
and
we'll
probably
want
to
Pivot
to
like
executing
scale
tests
on
Oz
from
the
cluster.
That's
in
AWS.
So
that
way
we
not
have
to
like.
E
We
can
do
things
like
avoiding
egress
between
them
to
upload
the
logs
and
things
like
that
at
some
point
down
the
line
so
we'll
probably
have
to
like
pivot
the
accounts
but
I
think
today
we
can
just
continue
on
with
where
things
are
at
and
just
make
another
account
pool
that's
dedicated
to
this
purpose
and
add
it
to
the
the
automation
and
sort
that
out
later.
C
C
C
F
I
I
mean
I,
think
I
if
I
I,
just
I'm
fine,
to
run
tests
with
the
vpcc
and
I
I
think
we
should
also
run
them
with,
like
whatever
we
choose
Calico,
because
then,
when
we
see
a
an
error,
we
get
more
data
right
like
we
know,
or
we
see
a
regression.
We
know
like
oh,
like
it's
only
with
VPC
cni
or
it's
only
with
Calico
or
it's,
but
with
both
right.
F
It
points
us
in
a
certain
direction
and,
like
we
found
this
very
useful
with
we
have
what
we
call
the
grid,
where
we
run
like
permutations
of
various
things
like
all
the
cnis
and
os's
and
kubernetes
versions,
and
things
like
that
with
chaos,
and
so
we're
able
to
tell
like
to
an
extent
what
what,
where
the
problem
Lies
by
looking
at,
like
you
know,
sort
of
which
which
tests
are
failing
and
which
ones
aren't
failing
or
which
ones
are
regressing
and
which
ones
aren't
regressing.
E
So
suggesting
keeps
him
for
ahead,
like
we
really
would
like.
I
would
really
like
to
get
us
to
a
place
where
we're
able
to
say
you
can
just
take
this
job
and
you
can
switch
like
a
flag
or
two
and
it's
it's
gcp
or
Amazon,
and
we
can
move
between
them.
That's
part
of
the
reason
that
I've
been
discussing
using
chaops
I
think
it's
the
most
mature
cluster
tool.
E
We
have
available
to
the
project
that
can
Target
both
and
is
already
like
integrated
with
other
it
like
test
tooling,
has
pretty
good
coverage,
whereas,
like
we
have
most
of
our
jobs
on
Cube
up
today.
So
if
we're
gonna,
do
the
lift
to
move
away
from
Cube
up
moving
towards
having
relatively
identical
config,
but
switching
out
the
providers
puts
us
in
a
place
where
you
know.
E
If
down
the
line,
we
find
out
that
Kate
simphra
is
blowing
the
budget
downloading
things
from
Amazon,
then
we
can,
you
know,
shift
some
jobs
but
like
back
over
to
gcp
or
or
whatever,
so
we
don't
I,
don't
think
we
have
to
strictly
do
everything
identical,
but
the
closer
we
can
to
matching
the
setup
between
them
and
then
using
the
same,
tooling
and
stuff
will
put
us
in
a
much
more
flexible
case
where
we're
not
going.
Oh,
we
need
to
turn
down
scale
tests
because
we're
in
a
budget
crunch.
E
C
Garden,
okay,
I
I
mean
that
is
I,
see
value
and
also
not
at
least
so
yeah.
That's
why
you're
in
test
being
the
same
so
that
we
can
swap
one
out
for
the
other
on
demand,
there's
also
value
in
trying
different
configuration.
So
we
cover,
let's
have
a
different
surface
of
issues.
Some
some
issues
may
not
be
caught
by
the
GC
test
today,
because
of
the
way
the
things
are
configured
there.
E
Absolutely
that's
something
else.
We
brought
up
I
think
dim
said
something
about
only
using
cubanum
and
when
the
testing
leads
talked,
we
were
kind
of
like
we
don't
necessarily
want
to
be
that
uniform
and
how
we
configure
clusters
so
that
we
get
better
coverage.
E
C
E
B
I
think
one
one
potential
concerns
I
have
is
that
I
would
like
to
have
one
configuration
whatever
that
will
be
running
as
soon
as
possible.
Then
then,
like
add
it
more
because
if
we,
if
we
divide
our
attention
into
multiple
things,
it
may
take
us
more
time
to
to
actually
have
the
first
one
running,
which
is,
which
is
probably
our
most
immediate
goal.
A
F
That
that
might
work
in
our
favor,
though
right
if
we
bring
up
two
ones
and
we're
like
oh
look,
Calico
isn't
as
scalable
as
vpcc
and
I,
like
hey
team
Calico.
Do
you
want
to
like
look
at
this
and
then
like
Calico
moves
a
little
bit
forwards
in
the
bbcc
and
I
was
like
Hey
TVP,
CMC
and
I
Calico
has
ever
taken.
You
and
you're
now
in
second
place,
they're
going
to
push
that
a
little
bit
harder
right,
so
that
might
be
a
fun,
a
fun
Dynamic.
So.
E
App
base
juncture
we're
a
bit
Limited
in
what
we're
running
in
gcp.
As
we
give
the
credit
situation
better
next
year,
we
should
have
more
room.
We
started
out
early
this
year
way
over
budget
and
we've
done
pretty
drastic
moves
to
bring
the
Run
rate
down,
but
we
still
have
to
get
through
this
year.
So
you
know
our
scale
signal
is
going
to
be
reduced
on
GCE
in
the
future.
That
might
not
be
that
should,
like,
hopefully,
shouldn't
be
the
case.
E
We've
brought
down
the
Run
rate
a
lot
and
should
have
room
and
scale
testing
is
something
that
I
personally
would
prioritize.
I
think
it's
something
that
the
project
can't
get
with
like
free,
off-the-shelf
resources
somewhere
else
and
it's
a
great
use
of
our
credits.
We've
only
pulled
the
lever
to
reduce
scale
this
year,
because
it's
just
one
of
the
few
things
that
you
can
actually
reach
and
pull
quickly,
but
I
I,
don't
think
we'll
have
super
great
signal
trying
to
compare
a
matrix
of
things
right
now.
A
E
We'll
want
to
be
able
to
to
like
load
bounce
in
the
future,
depending
like,
as
the
funding
situation
evolves.
My
understanding
is
that
a
small
difference
is
that
it
with
the
gcp
credits,
there's
three
million
job
deposit
in
the
account
every
year
and
there's
some
posts
with
a
public
commitment.
So
we
have
some
certain
level
of
like
we're
getting
exactly
this
much
and
we
know
what
we're
doing.
E
Database
credits
are
a
little
bit
more
complex
and
are
like
deposited
in
trenches
and
the
like
exact
amount
may
be
adjusted
and
depending
on
usage
and
things
like
that,
and
we're
still
figuring
out
things
like.
How
much
is
it
going
to
cost
to
run
the
like
General
CI
things
when
we
move
more
of
that
over?
We
don't
have
as
good
of
idea
what
the
Run
rate
for
other
reasonable
things
to
run
is
going
to
look
like
so
I
think
we
want
to
maintain
a
fair
bit
of
flexibility
between
where
we
run
this.
G
C
All
right
so
I
think
next
step.
We
have
the
next
steps
with
setting
up
accounts,
getting
limits,
increased
and
I.
Think,
given
our
decision
to
kind
of
have
these
jobs
fungible
we
need
to
have.
We
want
to
use
the
same
cni.
C
B
We
have
some
suits
with
kalika,
but
the
scalability
tests
aren't
using
Calico
underneath.
G
C
Okay,
so
so
in
that
case,
is
the
plan
that
we
move
the
gcp
jobs,
also
onto
Calico
or
kind
of
find
an
equivalent
of
the
iplus
cni
in
AWS,
because
because
there's
no
there's
no,
so
that
that
component
won't
work,
as
is
on
AWS,
because
things
are
a
bit
different,
vpcs
and
stuff.
So
we'll
have
to
use
something
else.
C
B
Would
I
would
personally
start
with
like
what
you
are
proposing,
the
VPC
cni,
and
once
we
have
that
started
once
we
have
that
running,
given
also
that
what
you
said
that,
like
it's,
it's
maintained
by
you
by
by
AWS
and
so
on,
that
we
know
that
it
should
be
working
at
least,
then
it
should
result
to
set
up
and
then
once
we
have
that
running
like
to
like
create
the
Kali
code
related
job
like
us
as
a
follower.
C
A
Yeah,
so
actually
for
the
Calico
we
already
have
for
gcp
I.
Think
one
interesting
point
would
be
also
probably
psyllium
but
yeah.
We
don't
pay
much
attention
to
the
Calico
and
I
think.
We
also
know
that
in
larger
scale
it
will
struggle
basically
yeah.
E
So
if
we
can
get
a
state
working
on
AWS
that
we
should
also
be
able
to
run
on
gcp,
like
that's
a
like,
that's,
probably
a
pretty
close
follow-up
step,
sick
testing
again,
what
I,
what
I'm
hoping
to
do
out
of
some
of
all
this
is
get
us
out
of
a
state
where
everything
is
Cube
up
and
into
a
state
where
it's
a
tool
that
can
Target
multiple
clouds.
E
So,
while
probably
the
most
pressing
thing
is
getting
something
running
on
AWS
and
it's
totally
reasonable
to
just
use
whatever
is
convenient,
I
hope
we're
kind
of
fast
following
towards
a
state
where
you
can
take
this
config
and
we
can
replicate
it
back
to
gcp
and
and
rotate
over
to
using
this.
The
same
tooling.
C
Okay,
yeah
yeah
I,
think
that
makes
sense,
cool,
okay
and
the
last
question
I
had
for
you.
Justin
is
there
so
because
there's
a
bunch
of
flags
that
these
tests
are
tweaking
today
through
these
batch
scripts
for
cube
up?
Is
it
possible
through
cops
as
well.
F
In
general,
it's
possible
if
they
are
Flags,
it
can
be
relatively
straightforward
because
we
can
they're
probably
already
mapped
in
the
configuration
and,
if
not,
we
can
just
add
a
mapping
for
them.
Environment
variables,
I
think,
are
a
little
bit
trickier.
I.
Think
someone
brought
up
that
we
might
need
an
environment
variable,
so
we
can
add
the
support
for
it.
F
It's
particularly
tricky
for
things
like
Cube
API
server
through
scheduler
Coupe
control,
Avenger,
the
sort
of
system,
components
for
things
like
the
vpcc
cni
you
can,
or
we
can
easily
make
it-
that
you
can
just
override
the
whole
manifest
and
choose
your
own
manifest.
So
we'd
probably
use
that
route
there,
but
in
general
the
answer
is
not
necess
they're
not
all
going
to
be
there,
but
we
can
easily
add
them
on
the
chaop
site.
B
So
I
think
I
think
the
bottom
line
is
that
we
are
like
configuring
in
a
custom
way,
a
bunch
of
API
server,
part
in
particular
flux.
So
we
probably
will
need
to
to
do
that
or
plan
that
mapping,
at
least
for
things
that
I
can
I
can
send
you
the
link
to
what
things
we
are.
We
are
changing
an
API
server
for
gcp,
so
I'm,
assuming
that
we
probably
want
to
to
make
it
similar
in
on
AWS.
F
G
B
Good
question
I
think
we
don't
but
I'm,
not
100
person.
Sure
like
me,.
E
I
think
we
have
a
few
small
ones,
but,
like
you
can
you
can
I
think
all
of
those
can
be
passed
through
at
CDM
on
chaos
config
already,
and
this
sort
of
spelunking
is
why
we
want
to
switch
to
something
example
right
now,
probably
the
biggest
lip
for
reporting
any
of
these
jobs
is,
is
looking
at
the
environment
variables
that
we
set
to
cube
up
and
figuring
out
what
they
actually
do
and
then
converting
that
to
some
other
Tool.
E
So
hopefully
we
can
do
that
once
to
the
to
the
cap,
cluster
spec
and
then
and
then
that
will
be
portable.
It
I,
don't
think
anybody
actually
knows
all
these
I'm
one
of
the
current
maintainers
of
the
cube
up
scripts.
If
you
can
even
call
it
that
and
they're
a
nightmare,
the
environment
variables
tend
to
get
string
interpolated
into
bash,
generating
yaml,
possibly
multiple
layers.
Deep
and
like
we're
gonna
have
to
go,
look
and
figure
out
what
they're
doing
and
what
like,
what
flags
are
actually
set.
F
I,
don't
know
if
I'm
allowed
to
to
lobby
but
I
will
plug
my
cyprian
and
my
excipient
and
myself
for
doing
a
kubecon
talk
about
like
the
direction
we're
trying
to
get
in,
which
is
that
we're
trying
to
have
more
manifests
Plum
through.
So
the
idea
being
that,
ideally,
when
we
like,
if
we
come
up
with
the
configuration
for
the
awsupcc
and
I
in
particular,
that
will
be
yaml
and
it
will
be
reusable
whatever
tool
you
choose
to
use.
F
Obviously
that's
harder
for
the
API
server,
where
there's
a
lot
more
like
interpolation
of
value
or
mixing
of
values.
That
has
to
happen
but
like
maybe
we
can
do
patches
or
something
there,
but
that
isn't
there
yet.
But
that's
something
that
we
hope
to
talk
to
many
people
in
this
Sig
about
at.
B
B
E
I
think
the
other
thing
we're
gonna
have
to
Port
is
the
the
scale
tests
have
a
custom
log
dumper
that
dumps
a
lot
of
things
in
a
performant
way
and.
H
E
At
some
point,
we'll
probably
want
to
see
if
we
can
get
all
the
infra
tooling
happy
with
dumping,
that
to
S3.
That's
probably
one
of
the
interesting
upcoming
problems
for
getting
all
this
running
smoothly.
C
So
I
part
Dash
should
be
possible
to
run
based
off
of
Matrix
and
s32,
and
this
you
know,
because
we
actually
use
it
into.
We
started
using
it
internally,
too.
Okay
and
I
know.
If
maybe
we
have
to
make
some
patches,
but
I
guess
I,
think
that
should
be.
It
should
be
easier
to
work.
It
may
have
to
be
a
different
first
Dash
dashboard,
though
I
don't
know
if
it
will
work
with
both
at
the
same
time,
but
yeah
I
think
both
both
the
callouts
are
good
ones.
E
I,
don't
think
we're
super
worried
about
jumping
over
to
S3
right
away.
I
mean
the
log
storage
and
stuff
is,
is
not
cheap,
but
it's
not
really
top
of
the
bill
and
I'm
not
overly
concerned
about
the
egress
between
them.
But
it'll.
Just
make
sense
eventually
to
like
actually
execute
the
Amazon
jobs
like
completely
in
Amazon
and
store
the
results
in
Amazon
and
just.
E
E
Places
that
we're
gonna
have
to
decouple,
though,
and
or
at
least
couple
it
to
like,
chaops
or
whatever.
That
is
it
if,
coupled
to
the
tools
at
least
a
multi-cloud
tool,
instead
of
specifically
how
Cube
up
works
and
right
now,
the
log
dump
script
that
we
use
in
scale
is
different
from
the
other
ones.
The
I
think
the
trickiest
part
is
the
SSH
is
to
the
nodes
and
dumps
directly
with
credentials
to
to
keep
it
manageable
versus,
like
the
CI
pulling
everything
in
and
then
uploading.
C
C
Muscle
control,
okay,
okay,
yeah
I,
guess,
yes,
yes,
I!
Think
the
the
bigger
the
expensive
part
with
transfers
and
stuff
like
SSH
into
CP
the
control,
plane,
nodes
and
they're
pushing
their
logs
for
now.
I
think
that
might
be
okay.
E
And
more
that
like
it
has
to
like
it,
has
to
be
compatible
with
whatever,
like
cluster
tooling
you're.
Using
and
right
now
it
assumes
things
like
the
log
pass
that
Cube
up
clusters
have
and
and
like
the
SSH
access
that's
available
to
all
of
the
GCE
VMS
from
CI,
and
that
sort
of
thing
we'll
have
to
pull
that
thread
a
little
bit
to
make
sure
that
the
scale
log
dumper
works
on
the
chaos
jobs
on
AWS.
C
Okay,
yeah
yeah
I,
think
that
makes
sense
all
right,
I
think
at
all
the
time.
A
lot
of
interesting
topics
this
time
think
we
we
covered
a
lot
of
ground
with
what
we
need
to
do
after
this
meeting.
I
can
take
some
of
this
and
just
update
that
issue
and
what
the
next
steps
we
need
to
do
and
I
might
have
to
ask
for
some
help
with
respect
to
the
setup
of
accounts.
E
I'm
going
to
point
you
to
our
no
for
that,
probably
though
he
may
delegate
somewhere
else.
H
I
would
take
care
of
that.
That's
not
really
an
issue
I'm
working
on
that.
Currently
it's
just
I
have
to
I've
been
busy
with
other
stuff.
C
Awesome
thanks
Anna.
If
you
can
let
once
you
provision
them.
If
you
can,
let
us
know
that
you
know
we
show
the
account
IDs.
Now
we
can
get
some
of
the
limits
increased
for
them.
H
I
will
what
I'm
trying
to
do
is
automate
icon
creation
and
also
service
quora
requests
at
the
same
time:
okay,
okay
and
basically
leave
that.
So
we
go
for
the
three
days
of
requests,
because
I
cannot
upgrade
the
super
plan
to
get
the
faster
restaurants
from
support.
So
what
I'm
doing
is
like
create
your
account,
create
the
increase
the
code
and
wait
until
we
get
done
by
supporting
if
it's
too
slow
I
would
paint
them
so
you'll
to
see.
If
we
can
just
do
that
internally,.
C
C
Okay,
thank
you
cool
thanks
folks.
We
should
wind
up.
We
already
owe
the
time
Marcel.
Are
you
able
to
stop
the
recording
or
was.