►
From YouTube: Kubernetes SIG K8s Infra - 20230524
Description
A
A
B
A
A
That
got
fixed
now
we
see
God's
going
down.
We
now
yeah
we're
back
to
what
we
were
expecting
so
X,
except
that
I
don't
see
anything
very
specific
or
any
okay.
I
see
cost
increase
in
computes.
A
A
A
Relate
to
the
build
cluster
I'm,
not
sure,
maybe
many
possibilities,
job
moving
from
Google
to
our
own.
A
A
D
D
Sorry
finish
about
to
say
something
later.
A
D
D
A
A
Okay,
for
the
month
by
month,
we
are
not
falling
in
the
mod,
but
I
think
we
still
are
in
the
same
channel
last
month.
So
I
think
that's
pretty
good.
A
A
A
The
one
thing
I
would
like
to
do
in
the
future
is
that
delete
your
account.
We
don't
use
at
all
because
I
say
a
lot
of
Icon
not,
but
you
also
I'm
curious
about.
A
There's
like
the
decision,
the
decision
trees
in
90
days,
so
we
can
trigger.
We
can
and
there's
no
way
to
disable
service,
only
the
best
lighting
GSP,
so
I'm
like
yeah.
We
now
have
an
entire
organization
and
we
have
now
multiple
postcodes
account.
We
should
probably
start
to
delete
them
to
always
see
it
as
a
benefit,
because
we
can
force
now
people
to
migrate
to
the
community
infrastructure.
A
E
E
Like,
for
example,
the
CI
stuff
that
wasn't
using
Bosco's,
yet
some
of
it
was
referencing
like
a
secret
with
an
account
credential
that
points
at
some
account,
but
we
don't
want
to
keep
doing
that
anyhow.
B
E
A
C
We
want
to
cycle
through,
take
five
of
them
or
so,
and
just
try
to
do
one
on
deletion
just
through
another
process.
So
we
can
have
some
confidence
in
deleting
larger
squats
and
know
how
long
it'll
take
to
go
through
the
undulation
process.
A
So
mostly
when
you,
when
you
select
the
icon,
is
not
fully
deleted
into
90
days.
So
basically,
it's
going
to
become
the
icon
will
be
16.
before
yeah,
okay
before
it
disappeared
before
the
account
is
gone
so
I'm
not
real
and
River.
That
is
mostly
open
opening
a
case
on
support,
so
I'm
not
really
worried
about
beautiful
the
duration
process.
D
A
I'm
not
aware
of
a
way
to
do
that,
you're
after
being
a
the
best
folks,
then
maybe
they're
more
expert
on
that
to
figure
out
what
that
what
is
inside
this
account,
but
in
any
case
I'm
more
interested
to
job
migration
to
new
to
the
community
owned
account.
That's
what
I'm
trying
to
figure
out
what's
inside
the
second,
because
if
you
migrated
job
any
job
right
now
that
will
that
will
give
us
some
insight
about
exactly
what's
needed
to
make
those
jobs
fully
running.
A
A
Okay,
oh
Chief
is
not
here,
I
have
a
question
for
him.
D
D
We
have
a
dedicated
monitoring
stack
for
that
cluster
in
that
is
running
that
is
based
on
properties,
and
that
is
not
exposed
in
any
way
at
all
to
access
it.
You
need
to
have
access
to
the
cluster,
to
do
something
like
Cube,
CTL,
port
forward,
get
access
to
the
girlfriend
service
and
then
access
it
from
the
browser,
but
that's
not
a
deal,
and
that
is
definitely
not
accessible
by
communities.
So
if
they
start
from
getting
some
jobs
run
into
problems,
they
don't
have
any
tool
to
investigate.
D
What's
going
on
to,
for
example,
figure
out
if
there
are
any
problem
with
resource
limits
and
requests
and
stuff
like
that
in
general,
especially
now
that
here,
for
example,
in
4C
resource
what
does
sibling
now,
the
idea
that
they've
got
speaking
of
that
is
that
we
expose
that
stock
and
I
just
brought
this
up
for
discussion
with
folks
that
are
working
for
it.
D
This
is
Patrick
and
Jan
who
is
not
on
call
today,
but
generally
speaking,
we
can
Yan
created
a
PR
that
is
basically
creating
a
load
balances
service
for
that
final
setup,
and
that
is
going
to
expose
it.
But
we
need
to
figure
out
a
domain
like
how
are
we
going
to
access
it
and
I
thought
about
using
something
like
monitoring,
dot,
proud,
cks.kats.io.
D
But
then
we
need
to
figure
out
how
to
manage
that
domain.
How
to
manage
certificates
like
monetary
domain
is
not
that
problematic.
We
can
add
an
empty
to
the
NS
configuration
and
so
on,
for
example,
them
so
Arnold
can
record
soil
that
DNS
configuration,
but
the
problematic
part
is
how
to
handle
certificates,
because
I
don't
think
we
have
suffix
similar
to
what
we
have
with
gke,
so
that
you,
for
example,
create
manager
certificate
resource
so
that
it
has
a
certificate
for
you.
D
That's
Sophia
didn't
file
for
abs
right
and
what
I
found
is
that
this
ACM
I
think
it
is
called
like
that,
and
it
allows
you
to
request
the
certificate
for
your
domain
and
basically
to
attach
it
to
Avia
slot
balancer,
for
example,
whatever
type
Cloud
balancer.
Are
you
using
like
elastic
loss,
balancer
or
whatsoever?
But
the
issue
is
that:
how
should
we
handle
that?
Because
there
is
a
DNS
verification
process,
then
data
query
is
some
additional
records.
In
that
case,
we
should
probably
manage
ACM
with
terraform
I,
believe
so,
and
there's
that
too
so.
D
E
Go
ahead,
we
we
have
a.
We
have
an
automation
based
on
octodn
SLS,
you
control
DNS,
with
yaml
I'm,
one
of
the
owners
you
just
like,
send
a
PR
and
when
it's
approved
the
robot
act,
automates
it
and
we
I
think
we
want
to
stick
to
that,
because
we
also
have
some
tooling
to
help
make
sure
that
those
changes
work
safely,
like
we
roll
out
to
a
canary
Zone
first
and
have
some
tests
and
things
to
make
sure
that
all
it
all
actually
rolls
out
properly.
E
It's
pretty
mature
and
if
you
need
to
do
things
like
certificate,
provisioning
I
mean.
Usually
we
would
do
something
like
like
an
Acme,
Challenge
and
delegate
just
the
like
acne
sub
domain
to
some
external
service.
So
so
that's
what
we're
doing
for
like
gclb's
going
forward
before
that.
We
would
like
create
NLB,
get
an
IP
and
then
add
the
IP
to
the
to
the
DNS.
For
this
specific
entry,
we
shouldn't
need
to
Grant
external
systems,
direct
access.
D
Yeah
that
part
for
sure
I
mean
I,
understand
the
DNS
part.
It's
pretty
easy,
at
least
for
what
I
found
out
to
just
create
a
PR,
and
that
is
Rico
cell.
That
part
is
easy.
It
is
just
to
figure
out
certificates
like
because
I
don't
believe
we
can
use
the
same
way
that
we
use
on
gke
with
manager
certificates.
So
we
have
to
use
AVS
approach
and
that
is
going
to
be
a
little
bit
difficult.
D
E
We
previously
used
cert
manager
within
clusters
for
what
it's
worth
and
yeah
I
think
I
really
hope
we
can
standardize
on
on
acne
challenges.
It's
just
like
it's
a
good
standard
and
it
allows
us
to
delegate
things
like
this
without
handing
over
DNS
API
keys
or
anything
like
that,
yeah.
C
To
be
clear,
with
Ben's
suggestion
is
that
we
can
point
the
DNS
at
a
domain
that
could
be
managed
anywhere
as
far
as
for
the
dns01
challenges
and
whatnot
to
provide
those
certificates.
E
E
G
Not
necessarily,
but
sorry,
just
a
quick
knowledge
about
that.
If
I
made
do
we
want
to
do
a
performance,
dns-based
verification
or
a
different
device
certification
for
the
search.
G
G
The
the
only
thing
on
the
setup
on
the
search,
essentially
would
be
just
the
values
in
the
record
need
to
set
up
on
your
DNS
or
ipam
console,
and
that
would
be
the
the
most
separated
approach,
but
maybe
I
I'm
sorry
for
joining
late.
Maybe
I
misunderstood
the
requirement.
So
that's
why
I
asked
on
the
chat.
D
G
Is
only
needed
for
the
initial
verification,
but
I
think
for
an
ongoing
renewal
basis.
You
need
to
keep
that
cert,
but
again
that's
what
what
is
the
price
of
keeping
that
record
up
there,
any
any
records
that
you
set
or
more
more
precisely
any
certificates
that
you
set.
If
you
want
to
set
multiple
certificates,
the
same
record
will
be
used
as
a
verification.
G
E
We
have
existing
one.
Like
I
said
this
looks
like
the
Acme
challenges
that
we're
using
it's
a
different
name
than
the
Acme
standard,
but
it's
the
same
idea
use
cname
sub
domain
for
verification
purposes,
and
that
way
you
don't
you
don't
have
to
worry
about
it
in
this
you're,
just
forwarding
it
to
their
DNS
challenge.
Responder.
We
can
just
leave
those.
We
can
just
leave
those
entries
up.
It
will
mean
that
renewals
happen
and
it's
totally
decoupled
from
load
by
our
life
cycle.
This
is
this
is
back
practice
going
forward.
D
D
I
give
an
example
of
check,
maybe
because,
for
example,
let's
say
you
have
monitoring
power
KSI
or,
for
example,
name
is
not
important.
This
is
the
C
name
already
2.2
elb,
because
elb
is
something
like
it
is
basically
a
host
name.
It
is
not
an
IP
address,
so
we
can't
use
a
record
and
then
you
need
to
add
the
same
name
as
a
cname
to
support
sir
verification
right.
The
problem
is
that
that's
going
to
conflict
no.
E
No
cert
verification
is
on
on
a
on
a
subdomain,
with
the
special
name
for
the
verification.
Third
underscore
token
and
oh.
B
E
E
This
is
already
the
workflow
we're
using
for
all
of
our
other
sorts
going
forward,
so
should
be
fine.
Okay,.
D
E
Yeah
it
looked,
it
looks
to
me
like
you
actually,
don't
even
I
mean
you
don't
need
to
you.
Don't
you
can
create
the
challenge
independent
of
that
because
you're
just
forwarding
it
to
the
the
resolver
domain,
but.
G
You
can
use
the
output
of
the
of
the
verification,
I
think
for
the
record
as
a
reference
for
them,
but
I
think,
since
with
these
two
systems
are
separate,
that
would
be
a
semi
manual
step.
Nevertheless,
you
can
when
importing
a
certificate
stating
this
parameters,
or
you
know
the
data
data
resourcing
intervals
right.
You
can
State
whether
you
want
it
is
available,
so
it
won't
be.
You
know
it
would
fail
on
if
it's
handling
or
getting
a
surf,
it
has
not
been
verified.
Yet,
but
again,
that's
that's.
E
A
That's
the
only
item
in
the
open
discussion
is
all
anything
else.
People
want
to
talk
about.
A
Okay,
I'm,
like
like
one
I,
mean
two
question
one
for
Marco
and
EP.
Are
we
good
with
this
capability
test
account?
Sorry,
is
it
now
on
board
or
is
there
something
else
we
need
to
do
sorry?
Can
you
repeat
that
or
not
I
was
asking
about
this
skeleton
I
want
to
make
sure
this
is
involved
in
both
course.
D
Yeah
I
am
not
sure
anyone
reached
out
to
me
for
those
accounts
on
boarding
them,
I
guess
I,
don't
even
have
access
to
that
account.
So
we
can
probably
follow
up
tomorrow
if
you're
around
that
we
figure
out
having
them
to
eks
problem
cluster
so
that
we
do
it
create
credentials
and
put
them
create
a
resource
for
that
in
Boston.
That
can
be
done.
We
can
do
that
tomorrow,
if
you're
around.
A
Yeah
Evie:
can
you
take
care
of
that
because
I'm
gonna
be
on
PTO
again
soon,
so
my
advantage
is
kind
of
limited
I'm
asking
this
because
James
being
me
last
week
about
this
one
to
make
sure
we
we
make
progress
on
this,
go
ahead,
go
forward.
I
will
my
second
question.
D
Are
you,
okay,
yeah?
Let's
figure
that
out,
we
will
see
how
we
can
handle
it,
but
yeah
I
think
it's
not
a
problem.
A
A
E
Some
follow-up,
with
with
six
scalability
folks
to
to
get
that
working
properly
against
clusters
that
aren't
Cube
up.
We
have
a.
We
have
a
job,
a
demo
job
for
this,
but
we
want
to
make
sure
that
it's
using
the
boss
ghosts
that
it
actually
starts
running
the
scale
test
instead
of
standard,
e2es
and
I.
Think
the
log
Dome
script
needs
supporting.
E
That's
probably
the
last
part
is
probably
the
trickiest.
The
rest
is,
you
know,
look
at
existing
jobs
and
change
the
config
options
to
to
select
the
right
tests
and
to
use
Bosco's
instead
of
a
fixed
account.
E
Right,
the
last
part
was
is
probably
I
mean
six
scale.
Bully
has
a
separate
script
that
they
own
right
now,
we're
only
gonna
go
into
too
much
detail
here,
but
basically,
instead
of
fetching
from
the
nodes
to
the
proud
job
and
uploading,
you
want
when
you're
doing
scale
testing.
E
You
really
want
to
run
something
that
uploads
from
the
the
CI
note
or
from
the
cluster
nodes
to
the
to
the
output,
because
you
avoid
a
lot
of
extra
data
transfer
right
instead
of
having
to
pull
it
back
into
the
job
pod
and
having
a
massive,
like
you
know,
thousand
nodes
worth
of
system
logs
and
then
uploading
those
to
the
job
storage.
E
We
have
it
set
up
so
that
when
we
run
the
cube
up,
drops
the
we
run
a
script
that
like
SSH
the
nodes,
grabs
the
system
logs
and
and
pushes
them
up
from
the
from
the
remote
machines.
We
don't
have
that's
not
supported
on
other
types
of
clusters.
Currently,
we
should
Port
that
to
work
with
kiops
clusters.
E
That's
gonna
be
a
little
bit
of
a
blocker
for
really
big
clusters.
It
is
a
pretty
major
performance
hack
for
CI,
okay
and
and
we
need
the
logs
to
to
like
you,
know,
debug
the
cluster.
E
I
think
that's
I
mean
anybody
could
hack
on
that,
but
I
think
we
were
hoping
for
scalability
work
on
that,
because
it
is
only
used
by
scalability
jobs
for
other
jobs
that
isn't
that
big
of
a
deal
to
just
pull
them,
pull
them
down
to
where
we're
running
the
tests
and
then
stick
them
in
the
artifacts
folder,
like
any
other
output
from
the
tests
and
that
and
that's
more
portable,
but
for
scale
tests.
E
It
becomes
performance
sensitive
enough
that
we're
going
to
want
to
employ
a
workaround
similar
to
that
and
right
now
those
scripts
are
in
test.
Infront
only
work
on
Cube
up
GC
clusters,
because
then
you
depend
on
sshing
to
the
node
and
having
access
to
upload
with
GSE
till.
E
They're
currently
going
to
the
the
existing
job
results
bucket.
E
I
guess
it
would
probably
be
okay
if
they
went
somewhere
else.
I
I
think
this
is
a
follow-up
concern
once
we
get
the
tests
running,
but
it
will
need
to
happen.
A
E
So
so
make
sure
the
job
is
using
the
Bosco's
pool,
just
change
that
config
and
then
or
or
do
any
follow-up.
That
needs
to
happen
with
the
cube
test
implementation
and
then
make
sure
that
the
test
Flags
get
switched
to
run
the
scale
tests
instead
of
normal
cluster
uobs.
Those
are
the
next
steps.
A
Do
we
that
we
async
trying
to
figure
out
a
job
consider
to
basically
ensure
we
have
we
use
the
proper
AWS
account
subscribers?
We.
E
G
E
So
we're
dumping
like,
for
example,
the
cubelet
logs
from
each
of
the
nodes
we
won't
know
we
we
want
to
dump
them,
because
people
need
to
be
able
to
come
back
after
the
CI
has
failed
and
understand
like
what's
wrong
with
the
cluster.
So
if
we,
if
we
try
to
filter
it,
we
won't
know
in
the
future.
We
can
look
at
you
know,
storing
logs
within
the
cloud
that
it
was
uploaded
to
I.
E
Don't
think
that
huge
priority
right
now,
it's
mostly
just
that
it's
slow
to
do
extra
copies,
as
opposed
to
like
the
cost
I
mean.
Similarly,
we
don't
cap
the
log
sizes
right
now,
we're
just
serving
them
directly
out
of
a
bucket.
E
The
the
the
real
cost
issue
is
just
having
to
pull
them
all
into
the
into
the
job
and
then
upload
them
from
there
versus
just
directly
uploading
them
from
where
they
are.
When
you
have
like
5
000
nodes.
That
starts
to
become
a
pretty
significant
bottleneck
to
completing
the
the
Run.
G
What
what
retention
policy
to
our
ego
life
cycle?
If
we
can't,
if
we
can't,
diminish
or
reduce
the
data
transfer
costs,
can
we
maybe
reduce
the
the
the
contents
of
the
bucket?
We.
E
Have
one
I
think
it's
90
days
currently
and
there's
some
follow-up
to
like
tune
that
sort
of
thing,
but
I
don't
think
we
need
to
block
on
that
for
this
and
same
thing
like
if
we
had
to
run
some
of
the
other
ways
where
we
just
pull
logs
down
to
the
nodes
and
upload
it.
That's
fine
to
start
we're
running
smaller
scale
ones,
but
if
we
want
to
seriously
switch
to
using
chaops
instead
of
the
GCE
shell
scripts,
so
that
we
can
have
a
tool
that
works
for
both
clouds.
E
We
will
need
to
have
something
like
the
scale
test:
log
dumper
that
works
ideally
on
chaops,
Oz
and
Chaos
GCE,
and
that
and
and
that
will
have
to
be
a
special
thing
like
we
can't
just
use
the
kubernetes
API
or
something
we
need
to
actually
get
on
each
of
the
nodes
and
grab
all
the
system
logs.
Even
if
the
nodes
are
in
a
bad
state.
So
you
need
to
do
something
like
on
gcloud
we
use
or
on
gcp.
E
We
use
gcloud
compute
SSH,
we'll
want
to
we'll
want
to
do
something
like
that.
E
I,
don't
think
I
think
we'll
want
to
make
it
support
both
I.
Don't
think
we
want
to
jump
to
diagnostic.
Usually
we
do
that
by
like
using
kubernetes
API
features,
but
we
kind
of
fundamentally
need
to
reach
the
nodes.
We
also
need
to
know
like
what
are
the
names
and
that
kind
of
thing
it
it'll
probably
need
to
be
chaops
and
AWS,
plus
gcp
specific,
and
if
we
ever
have
a
a
third
Cloud,
we
can
extend
it.
E
We
again,
we
only
need
this
for
scale
tests,
so
we
probably
don't
need
to
be
running
scale
tests
on
like
every
cloud,
and
we
don't
have
the
resources
to
do
that
today,
but
we
do
want
to
get
out
of
only
running
scale
tests
using
the
GCE
bash
scripts
and
and
once
we
can
get
we're
starting
as
well
just
with
chaos
AWS,
but
something
that
I'm
hoping
to
press
for
later
is
to
switch
the
the
GCU
ones.
E
We
still
have
to
use
caps
GC,
so
we
can
very
closely
match
and
get
away
from
those
single
Cloud
shell
scripts,
but
for
scale
testing.
The
the
scale
test,
log,
dumpers
kind
of
unique.
G
Understood
now,
I'm
just
thinking
about
the
options
that
I'm
I'm
more
of
in
terms
of
sending
logs.
If
you're
running
an
AWS,
you
can
use
databases
towards
like
a
cloud
watch
agent
and
could
maybe
add
some
files
there
during
the
run
and
only
activated
for
that
process,
but
I'm
not
sure,
there's
like
S3
director
S3
unless
you're
trying
to
get
the
dashing
A
bash
approach.
Something
maybe
just
that's
why
I
want
to
talk
about
protection?
Yes,.
E
So,
very
briefly,
kubernetes
CI
basically
has
a
standard
thing
where
there's
a
there's,
a
directory
that
is
specially
handled
and
when
you
run
some
tests
they
can
dump
things
into
there
and
they
get
persisted
into
a
storage
bucket
this
surface
through
the
ciui.
E
E
Things
like
that,
so
we'll
want
to
continue
to
store
things
there
to
support
all
that
tooling
and-
and
you
know,
other
the
users
in
the
project
that
are
expecting
logs
to
be
there,
and
we
have
some
tooling
built
on
top
to
do
things
like
you
know:
viewing
the
files
and
whatnot
it's
not
great,
but
it's
what
everyone's
used
to
so
we'll,
probably
just
want
to
you,
know,
get
them
copied
in
there.
A
Sorry
I
can't
see
if
they
unlock
a
Android,
so.