►
From YouTube: Kubernetes SIG K8s Infra - 20230426
Description
A
Welcome
everyone:
we
are
in
the
biography
kitchen
for
a
meeting.
We
are
April
26th,
just
a
reminder
that
this
meeting
is
under
the
code
of
control
and
will
be
later
on,
YouTube
publicly.
A
Thank
you
before
we
start
do
we
have
any
new
person
in
the
meeting
I'm
you
interested
to
introduce.
C
Hey
everyone
look
at
here:
I
was
a
kubecon
I,
went
to
the
Sig
meetings
and
met
Mario
and
I
said:
join
the
school
nice
to
meet.
Everyone
welcome.
A
Okay,
so
next
is
billing
dealing
with
product.
What
do
we
have
here?
Is
anyone
seeing
that
correctly
because
I'm,
like
you
know,
no,
is
it
too
big,
too
small?
It
is
okay,
okay,.
A
A
E
B
D
A
A
Yeah
for
the
gcp
for
the
gcp
infrastructure,
because
and
so
much
we
were
always
Beyond
250
000
K
virtual,
and
this
is
for
the
first
time.
Emperor
will
be
the
first
month
well
I,
where
I
will
not
receive
another
thing
I'm
over
budget
for
a
month,
so
we
will
spend
by
May
1
something
around
150k
monthly
and
that's
like
a
good
trend.
A
I
think
I
said
that
in
the
last
minute,
but
want
to
say
that
again,
so,
yes,
okay,
it's
time
to
add
more
infra,
but
we
will
not
do
that
now,
yeah,
we
so
the
last
three
months
we
were
spending
too
much
I
didn't
check.
Ben's
comment
on
slack
we're
spending
too
much.
Oh
Ben
is
here.
So
let
me
finish:
yeah
okay,
but
the
one
thing
I
noticed
is
like
if
I
check
the.
A
So
it's
not
really
a
problem.
It's
more
like
a
consequence
of
current
architecture.
We
have
forward
example,
one
region,
seven
serving
five
country
in
Europe
because
there's
like
we
don't
Deploy
on
every
gcp
region
in
Europe.
We
might
fix
that
later.
But
it's
a
consequence,
so
us
adding
starting
next
month,
we
will
start
add
more
region
in
at
in
gcp
and
integral.
So
we
might
see
that
go
down.
G
A
couple
comments-
one
we
are
actually
even
slightly
higher
than
we
should
be
at
the
moment,
because
we
still
have
the
123
CI
that
hasn't
gotten
cleaned
up.
So
release
is
running
an
additional
CI
version
beyond
what
we
would
normally
have
at.
Sometimes
we
have
an
extra
release,
but
right
now
we
have
like
an
extra
extra
release,
so
GC
costs
are
actually
pretty
elevated
at
the
moment,
and
so
this
should
Trend
even
better
in
the
future.
G
However,
to
the
opposite
end,
where
we're
talking
about
spending
up
more
infra,
we
want
to
do
that
on
Amazon.
We
actually
still
have
a
very
large
amount
of
infrastructure
that
hasn't
moved
out
of
Google.
Some
of
that
will
be
taken
running
inside
google.com
organization
instead
of
kubernetes.io,
so
build
directly
to
Google
billing
wise
that
isn't
a
problem.
My
understanding
is.
G
We
can
continue
to
run
that,
but
it
continues
to
be
a
liability
for
the
project
that
we
have
like
these
ancient
projects
from
the
beginning
of
kubernetes
that
predate
even
having
the
cloud
native
Computing
foundation
and
no
one
knows
much
about
them
and
we
don't
have
good
visibility
into
them.
We
want
to
replace
those
with
things
running
under
the
Community.
A
lot
of
that
spend
is
deal.case.io
so
that
the
FAFSA
contract
should
cover
that,
but
we
will
need
to
move
like
you
know.
G
We
still
have
a
very
large
amount
of
CI
things
like
that
and
and
some
of
it
we
can
move
to
Amazon
some
move
to
Google.
So
we
have
a
good
buffer
and
we
should
be
in
good
shape
for
the
kubernetes.io
gcp
bill
this
year,
but
we
should
not
jump
on
running
more
on
gcp
this
year,
so
we
have
some
buffer
and
going
into
next
year.
A
A
Yeah,
let's
take
this
down
our
services,
adding
on
AWS.
We
are
fine.
This
is
the
credit
we
receive
in
January
because
we
don't
just
for
people
new
in
the
car.
We
got
three
minutes
from
a
donation
from
AWS
we're
supposed
to
use
this
year
we
got
the
first
one
part
we
got
yeah
250k
in
January,
so
I
think.
Currently
we
are
not
using
that
much
except
for
some
CI,
because
we
have
to
aggregate
some
AWS
account
from
the
cncf
organization
to
the
community
organization.
A
A
Yes,
because
I
was
I,
did
an
experiment
was
using
some
easy
to
understand.
I
delete
all
of
them.
That's
why
you
see
a
job.
A
So
if
you
this,
it's
normal,
this
there's
a
shopping
cost
because
I
did
some
cleanup
on.
Basically,
some
of
this
account
also
it's
possible
like
build
a
test
running,
might
have
some
change,
so
we
don't
really.
We
are
not
really
granted
a
lot,
but
who
exactly
is
basically
using
doing
infrastructure
right
now.
A
To
see
yeah
yeah
regarding
your
costs,
do
we
do?
We
want
to
consider
currently
existence
cause
really
high,
because
I
don't
think
it's
I
to
have
to
spend
4k
on
ekx
per
month,
I
mean
with
a
3
million
budget
with
a
2
million
budget
per
year.
Do
we
want
to
consider
eks
spending
4K
per
month
is,
is
a
high
cost.
H
Well,
to
be
honest
still
to
me,
I
know
that
we
have
a
plenty
of
budget
but
20K
per
month
for
20
nodes
cluster
is
like
still
a
bit
high,
so
we
want
to
take
a
look
at
today,
but
it
is
not
a
priority
right
now.
A
H
H
So
I
don't
know
for
sure,
like
we
will
probably
have
enough
credits,
but
it
might
be
a
problem.
I
see,
I
had
more
workload,
especially
that
he
don't
know
how
much
workload
that
we're
going
to
add
like
you
can
definitely
say
on
that.
This
is
something
that
has
a
lot
of
space
for
civics,
because
we
didn't
really
pay
much
attention
to
cost
optimization.
We
want
to
get
it
running
as
soon
as
possible,
but
we
will
see
about
that
like.
E
D
What
surprises
me
is
the
price
for
eks,
because
I
don't
know
what
is
actually
included
in
those
4K,
but
I
thought
it.
It
costs
around,
like
10
cents
per
hour
of
of
running
control,
plane
so
I,
don't
know
why
we
have
so
so
high
why
the
spend
is
so
high
on
eks
service.
A
Well,
because
we
have,
we
have
CI
tests
trigger
creating
eks
clusters,
cluster
API
provider
for
AWS.
If
I
remember
correctly,
they
basically
have
like
this
bootstrapping
eks
cluster.
That's
why
we
have
and
that's
I
think
that's
the
control
plane.
A
H
And
also
to
add
something,
there
are
stuff.
For
example,
the
nodes
that
we
are
using
in
the
TKS
clusters
are
like
one
dollar
per
hour
or
a
bit
more,
so
this
is
20
something
dollars
per
hour
for
20
nodes,
and
also
we
have
established
stuff
like
stuff
like
provision
at
the
iops
and
stuff
like
that
that
we
don't
really
use.
That
is
also
significant
costs
that
we
can
probably
just
drop,
and
another
issue
that
we
have
is
networking
like
net
gateways
and
such
stuff
tends
to
cost
a
lot.
A
D
D
A
A
I
think
that's
it
now
we
can
and
we
can
I
think
the
AWS
API
can
help
us.
We
can
set
up
some
policy
preventing
bootstrap
new
resources
to
like
make
sure
we
don't
overspending,
but
before
we
do
all
of
these
I
think
we
can
like
start
moving
some
CI
and
see
what
exactly
happening,
because
currently
we
just
we
have
like
yeah
the
the
Kappa
test,
plus
some
other
tests
like
midi,
cube
a
stoplight
we
currently
use.
A
Now,
if
you
move
some
CI
from
gcp
to
address,
we
might
see
cost
increase
and
that's
going
to
help
us
establish
the
Baseline
of
cost
per
month,
and
from
that
we
can
basically
say
what's
the
three
short
monthly
for
Enterprise
I
mean
before
we
jump
on
cost
optimization
I.
We
want
to
establish
a
trend
in
terms
of
cost,
because,
right
now
it's
like
it's
really
random.
G
It
does
seem
like
something
to
keep
an
eye
on,
though,
given
that
we
should
be
a
fairly
small
scale,
though,
and
that
is
generally
how
we
we
maintain,
you
know,
cost
controls.
Just
we
talk
about
the
bill
here
and
keep
an
eye
on
things
same
on
gcp.
We
have
Auto
scaling
on
this
cluster.
If
we
don't,
then
I
feel
like
that
would
be
the
thing
to
invest
in.
A
G
It
seems
like
we
don't
have
a
very
good
answer
for
that
on
AWS.
Currently,.
A
Yeah
we
we
talk
about
this
many
times
like
about
reporting
on
the
AWS,
so
I
think
JFR
has
a
plan
for
that.
I'll
I
will
even
drive
this
and
try
to
come
up
with
some
reporting
around
how
we
can
break
down
costs
between
database
account
everywhere.
Services
and
possible
group
of
resources
like
how
we
can
identify
I
would
say
resource
used
by
specific
seagulls
approaching.
G
In
theory,
we
have
some
support
from
people
at
at
Amazon.
Can
we
talk
to
them
about
this
because
I
mean?
Surely
it
has
to
be
something
realist?
There
has
to
be
some
way
to
label
like
sub-accounts
or
something
and
and
display
that
I
I
hope
we
don't
actually
have
to
build
something
for
that,
like
I,
can
just
pull
up
the
cloud
building
report
and
Cloud
console
and
and
see
this.
A
Yeah,
that's
what
I'm
saying
I
know
Jiffy
is
driving.
This
is
supposed
to
talk
to
it.
Some
AWS
talks
about
this
bye.
Now
we
want
to
make
that
a
priority
for
now
for
128
or
do
we
want
to
just
make
sure
we
keep?
We
are.
We
keep
bulk
database
folks
about
this.
G
I
mean
I
feel
like
if
there's
anything,
we
need
to
change
about
how
we
spin
up
infrastructure,
for
example,
if
they
need
to
be
in
different
sub-accounts
or
something
like
that.
We
need
to
get
a
handle
on
that
before
we
spin
up
too
much
stuff,
because
eventually
this
is
going
to
be
a
problem.
We
need
to
be
able
to
actually
identify
where
the
spend
is
going.
G
I
A
It
just
takes
some
time
to
implement
the
current
policy
around
this,
but
there's
an
issue,
some
more
about
tactics
and
we
start
doing
yeah.
We
started
doing
some.
We
have
introduced
a
few
tags
to
make
that
happen.
We
just
now
need
to
tag
everything,
doing
account,
creation
and
also
resolution,
which
is
not
done
granted.
D
A
D
A
A
H
A
I
think
there's
a
way
to
basically
have
those
as
a
template
Also
regarding
I.
Never
pushed
for
I,
never
really
wanted
to
do.
Advanced
usage,
of
course,
Explorer
responsible
to
have
breakdown
per
accounts.
Then.
G
E
G
A
Well,
it's
that's,
I,
think
I
think
well
it's
about
basically
which
formation
you
want
to
get
from
all
this
data.
It's
really
yeah.
G
Well,
we
want
to
identify
which
parts
of
the
project
are
using
a
reasonable
amount
of
spend
or
not.
So
like
I'm,
not
surprised
that
running.
The
registry
is
a
little
bit
expensive
and
it
looks
in
line
with
reasonable
costs,
but
if
Kate's
in
for
proud
suddenly
cost
like
10x,
then
I
would
like
that
would
want
to
know,
and
it
doesn't
really
matter
which
service
that
flags
that
this
is
the
part
of
the
project.
We
need
to
go.
Look
at
okay,.
G
Did
we
have
a
discussion
about
the
outage
it
feels
like
that
might
be
a
a
good
topic
if
people
have
time.
Oh.
A
K
Right,
so
your
class
has
page
this
information
in
there
pretty
useless
with
regards
to
Global
Systems
right.
One
thing
that
Google,
for
example,
has
the
answer
is
that
they
mitigated
problems
with
the
Google
Cloud
load
boxers.
That
is
the
that's
the
thing.
That's
given
us
the
biggest
grief
right
now
for
this
particular
project
or.
K
There's
some
other
fun
problems
going
on
with
the
global
components.
So
so
did
you
hear
anything
about
the
Google
Cloud
load
balances
mitigation
is
being
done
to
them
so
that
they
can
actually
work
in
Europe.
G
K
G
G
I
think
the
there's
like
distinct
things
you
couldn't
configure
it
at
all.
Briefly,
that
is
mitigated
the
thing
that
we're
running
into
is
that
we
have
it
configured
but
somewhere
in
there
we're
still
seeing
traffic
that
would
have
been
routed
to
the
Paris
back
in
previously
isn't
being
routed
properly,
I'm,
actually
not
completely
clear.
G
What
I'm
seeing
is
that
we're
getting
a
503
from
back
end,
which
makes
me
think
it's
still
trying
to
route
to
the
to
there,
but
I
think
that
may
be
that
the
the
control
plane
is
back,
but
the
data
plane
isn't
back
for
Paris,
because
because
I
think
it's
public
gclb
is
like
a
a
broad
Edge.
That's
any
cast
so
I
think
what
we're
encountering
as
a
project
is
The.
Edge
that
is
affected
is
is
still
broken
for
us
in
that
area.
G
So,
even
if
you're
not
on
gcp,
because
the
traffic
is
still
coming
into
Paris
and
the
Paris
data
center
is
having
problems
that
that's
not
getting
updated
to
Route
somewhere
else
for
the
traffic
in
other
regions,
it
is
updated
now
and
from
what
I
can
tell
if
you
hit
any
other
gcp
region,
it
won't
get
routed
to
the
Paris
region.
Even
if
we're
not
running
there,
it'll
get
right
into
one
of
the
other
regions
now.
G
So
the
problem
is
the
the
traffic
that's
actually
trying
to
enter
through
that
data
center
at
best,
I
can
tell,
and
so
to
Eddie's
question
that
is.
That
is
my
question
is:
is
there
anything
else
we
can
do
to
make
sure
that
that
traffic
would
have
failed
over
because,
generally
speaking,
this
is
what
happens
so
we
have
a
gclb
configured
with
the
gclb.
We
have
a
scope
for
each
region
that
we're
operating
in
in
the
network.
Endpoint
groups
that
then
points
to
a
cloud
run
in
the
same
region.
G
So
when
you
hit
the
gclb
you're
going
to
enter
it
through,
like
the
nearest
Google
location
and
then
it
should
get
routed
by
our
routing
map
and
our
routing
map
says
you
know
these:
are
the
regions
we're
in
route
to
a
service
in
that
region?
So,
even
if
a
region
isn't
automatically
removed,
we
can
do
what
we
did
today
and
remove
region.
G
Our
no
had
problems
like
that
earlier,
because
the
control
plane
had
an
an
outage
when
this
region
had
outage
that's
fixed.
That
was
not
that
didn't
persist
very
long.
What
we've
seen
is,
there's
still
some
transient
errors
and
it
seems
that
the
data
plane
is
still
not
updating
in
that
area.
I,
don't
know.
If
there's
anything
more,
we
can
do
about
that
as
an
end
user.
K
Yeah
but
Google
didn't
quite
make
it
clear
that
there
was
control
plane
failures
when
you
read
this
outage
on.
First
glass,
you'd
think
oh
Google
just
disconnected
the
region
and
they
fixed
it
out
right.
But
that's
not
exactly
what
happened,
though.
It's
Global
components
that
shouldn't
be
failing
were
failing
for
a
very
long
periods
of
time.
There's.
J
I
think
also
like
Ben,
we
should
emphasize
you
are.
You
are
doing
a
good
job
of
speaking
as
a
member
of
the
community
that
happens
to
have,
like
you
know,
access
to
to
more
stuff,
but
you
are
not
like
revealing
inside
information
and
you're,
not
speaking
as
a
googler
here
like
this
is
what
these
are,
the
things
that
we
have
observed
in
the
in
in
registry
caseio
right.
J
This
is
not
like,
like
the
gclp
update
and
I,
don't
think
we
have
any
particular
insight
into
what's
going
on
exactly
there
beyond
the
sales,
the
status
page.
G
I
can
point
that
there
are.
There
are
multiple
different
status
entries
and
they
affect
different
things.
The
one
for
the
control
plane
for
gclb
specifically,
is
that
is
that
last
link,
and
that
is
that
is
considered
result,
and
you
can
see
that.
G
D
G
I'm
sorry
I've
lost
track.
I'll
dig
for
that.
K
All
right
so
because
that's
the
first
people
ask
us
right
all
right:
Google
had
an
outage.
We
lost
the
region.
It's
happened
before
it
happened.
Last
year,
London
was
gone
for
a
while,
but
you'd
expect
the
control
plane
to
be
buggy
for
a
few
hours
and
then
to
be
resolved.
K
G
Worry
I
I
mean
I
I
also
can't
do
a
whole
lot
about
that.
Like
I'm,
just
just
a
software
engineer,
there
I'm
largely
participating
here
as
a
as
a
community
member,
which
you
know
Google
Staffing,
but
we're
I'm
still
like
the
rest
of
the
group,
I'm
still
largely
on
the
user
side,
when
we're
talking
here,
I'm
gonna
be
talking
to
some
people
that
have
more
knowledge
of
gcp
networking
about
what
our
options
are
and
taking
advantage
of
that.
G
And
I
mean
yeah
in
the
case
of
data
center
is
partially
underwater
yeah.
That's
that's
some
problems.
G
I
really
hope
that
we
can
do
more
to
to
Route
around
it
and
I'm
not
like
as
an
end
user,
not
thrilled
about
that,
but
I'm
also
still
trying
to
confirm.
If
there's
anything
that
we're,
you
know
that
we
aren't
doing
in
the
configuration
like
I,
don't
see
that
we
have
any
health
checks.
For
example,
it
looks
like
that
might
be
a
thing.
K
Oh
seven,
snakes
never
had
health
checks
from
day
one,
so
you
were
never
able
to
remove
a
dodgy
Cloud
run
service
from
serving
traffic.
Well,.
G
E
G
Do
that,
then
we
can
look
at
what
we
can
do
as
besides
the
bug
as
a
user,
we
could
probably
operate
something
ourselves
that
auto
updated
the
like
removed
it
from
the
group
or
something
and
we've
been
we've
manually
removed
the
one.
So
my
bigger
concern
is
that
we've
removed
the
group,
but
I
still
see
errors
and
I've
attempted
to
investigate
that
I.
G
Don't
have
a
better
answer
at
the
moment,
just
my
my
theory
from
what
I'm,
observing
and
I'm
going
to
be
asking
someone
who
actually
has
more
expertise
on
one
of
our
when
a
gk's
networking
teams
to
lend
some
time
to
this.
We're
originally
going
to
be
talking
about
the
cloud
provider,
removal
and
how
we
can
help
get
that
testing
running,
but
we've
talked
to
them.
G
So
we're
going
to
pump
that
and
we're
going
to
discuss
mainly
like
what
else
can
we
do
here
and
you
all
know
them
most
of
you
probably
Antonio
works
on
GK
networking
now
and
from
GK
site
gclb
is
is
one
of
their
things,
so
I'm
going
to
be
picking
Antonio's
brain
a
bit
about
gclb
and
and
what
else
we
can
do
there.
G
This
thing
has
no
SLA,
and
this
is
a
good
reminder
to
to
mirror
if,
if
uptime
in
a
particular
region
is
really
important
to
you,
otherwise
you
can
pull
from
another
region
and
we're
fine.
As
far
as
I
could
tell
between,
like
whatever
provider
you
use.
Proximity
to
the
to
the
gcp
region
is
is
a
problem
because
we
have
a
global
load.
Balancer
I
think
it's
not
super
reasonable
for
us
to
try
to
like
run
multiple
Cloud
looking
answers
or
something
and
something
like
this
can
happen.
G
We
can
have
an
outage
for
a
region
we'll
do
our
best
to
react
to
it,
but
if
you
need
higher
uptime
guarantees
than
that
is
there
as
a
global
Service,
it
is
still
largely
available.
G
And
that's
working
as
intended.
The
other
thing
I'd
like
to
do
is
roll
out
to
more
regions,
but
I
think
we're
blocked
on
concerns
with
the
image
promoter,
so
I
think
there's
still
more
to
do
to
make
the
image
promoter
scale
better.
So
we
can
spin
up
more
regions.
G
G
I've
been
talking
to
to
John
duffel
about
that
I
think
we're
maybe
still
not
at
a
point
where
be
super
comfortable
with
it,
but
we
we
should
revisit
and
if
not
I
just
thought
a
book
there's
a
few
more
things
we
can.
We
can
do
there,
there's
been
some
improvement
to
cosine
recently
John
had
to
quit
working
on
this,
but
we
have
a
couple
more
pointers
from
John.
K
G
Yeah
I'll
def
ask
along
that.
You
know
we'd
like
to
know
more
about
what
happened
and
what
we
could
do
there
and
and
I'll
be
talking
into
you
about.
You
know
what
else
we
can
do
to
configure
the
load
balancer
better
or
to
to
right
around
this.
That
sort
of
thing.
One
other
thing
we
have
to
consider
is:
we
probably
don't
want
like
super
excessive
failover
like
as
an
extreme
example.
If
we
stopped
serving
traffic
out
of
U.S,
east
and
Amazon,
our
bill
would
go
through
the
roof.
G
If
there's
an
S3
outage
in
U.S
east,
we
probably
just
need
to
tank
it
and
not
try
to
Route
around
it
that
like
we,
we
really
want
to
be
serving
that
sort
of
thing
in
region
and
that's
more
or
less
What's.
Happening
Here
is
that
we
have
20
GC
20
gcp
regions
currently,
and
one
of
them
is
impacted
and
it
in
the
rest
of
the
traffic
is
fine.
K
I
mean
it's
not
quite
that
all
right
you've,
just
not
input
yeah.
Well,
the
odd
fat
registry
isn't
available
by
the
way
which
is
One
impact.
The
other
problem
is
you're
going
to
pull
from
S3
because
nobody
can
access
Google
cloud
services
in
France
at.
G
All
right,
but
what
I'm
saying
is
only
only
the
traffic
that
went
to
that
out
of
the
20
places
that
we
route
through
only
the
traffic
that's
hitting
now
when
it's
affected.
If
you
are
running
in
another
Europe
region
on
any
of
any
provider,
then
and
you're
getting
routed
to
another
region,
then
you're
not
impacting
yeah.
I
K
B
Say
that
again,
so
yeah
so
basically
I
mean
we
still
stayed
everywhere
that
if
you
are
really
really
reliant
on
everything,
then
you
should
have
a
registry
duplicate
it
somewhere
in
your
own
infrastructure,
and
I
mean
the
downtime
that
we
had
or
the
downtime
that
we
have
is
basically
fairly
reduced
now
to
one
region.
Yes,
it's
still
bad,
but
I.
Don't
think
that
we
should
stress
this
too
much
that
there
was
an
outage.
G
Yeah
we
just
had
we
had
a
confirmation
from
Marco
earlier
that,
if
you're
running
in
another,
if
you
move
to
another
region
in
EU
on
AWS,
then
you'll
see
what
you
would
see
on
gcp
you'll
get
right
into
a
different
region.
You'll
be
fine.
It's
the
it's!
The
subset
of
traffic,
that's
close
enough
that
out
of
the
22cp
regions,
it
gets
routed
to
the
Paris
region,
whether
or
not
it's
on
gcp.
G
And
if
we
expand
to
more
regions,
that
would
be
another
way
that
will
improve
our
you
know:
failure
domains
but
I
think
it's
mostly
working
relatively
as
intended
right
now
and
we'll
have
like,
even
if
we
do
expand
failover,
we
have
to
be
careful
with
that,
because
the
whole
the
whole
intention
here
is
for,
is
that
you
you
that
you
are
served
out
of
the
closest
region
and
that's
good
for
users
globally.
G
So
we
don't
have
a
sudden
Spiker
or
cost
increase
for
one
of
the
regions,
and
you
know
if,
if
one
out
of
the
20
regions
going
down
is,
is
a
problem
for
you,
then
you're
a
really
good
candidate
to
to
host
a
mirror.
I
Okay,
I
is
there
any
other
questions
regarding
the
outage.
I
Okay,
we
can
wrap
it
up
here.
Then
thanks.
Everyone,
I'm
sure
we'll
be
following
up
and
slack,
and
all
this
stuff
see
you
around.