►
From YouTube: k8s-infra-team's Biweekly meeting 20200109
Description
wg-k8s-infra recurring biweekly meeting
A
B
C
A
D
A
E
So
the
main
piece
that
needs
to
be
done
is
launching
or
I
should
say,
creating
III
tests
for
the
auditing
mechanism
and
then
afterwards
launching
it
on
on
prod
and
when
we
do
launch
it
on
prod.
We
need
to
make
sure
that
if
it
fails
or
if
it
errors
out
that
permissions
are
set
up
on
the
project
such
that
people
or
a
group
is
notified
about
those
failures,
not
just
me,
so
that
part
can
happen
at
some
point
in
the
future.
E
E
E
D
E
D
E
A
D
Somebody
else
sent
me
a
PR
or
the
terraform
stuff
for
now,
because
I'm
sure
that
that
will
take
some
iterations.
While
we
convert
the
scripts
over
into
that
the
most
data,
so
I
would
say,
go
looking
group
see
a
mole
find
all
of
the
role.
What
wouldn't
call
them
role,
but
they
basically
are
role
groups
see
if
you
think,
one
of
the
maps
to
the
this
group,
it's
sort
of
not
the
alerting
group.
Now
they
who's
going
to
administer
the
auditor
right
right.
D
Okay,
I,
don't
know
if
we
need
a
separate
one
for
container
registry
or
if
that
should
just
fall
under
storage,
admins
I,
pretty
sure
we
have
a
story.
I
know
we
have
a
storied
admins
group.
If
that's
sufficient,
then,
and
that's
the
group
that
will
want
to
add
permissions
to
be
able
to
run
the
auditor.
E
So
I'll
talk
my
head,
I,
can't
think
of
tasks
that
could
be
done
at
the
same
time
in
parallel,
but
maybe
I
mean
it's
already
merged
in,
but
people
could
look
at
how
to
auditor
works
on
the
way
to
work
this
morning,
I
just
thought
of
one
additional
feature
which
was
a
requirement
that
I
don't
know
if
I've
implemented,
which
is
if
an
image
changes
in
GCR
and
it's
a
what
I
call
a
child
image
of
a
fat
manifest.
So
you
know
about
that
manifest,
and
then
you
have
like
child
images.
E
It
could
be
like
10,
there's
10
different
architectures,
so
if
one
of
those
child
images
gets
pushed
out
first,
which
makes
sense
that's
how
manifests
that
Memphis
are
created.
That
means
that
the
Pops
of
messages
coming
from
GCR
will
say.
You
know
this
child
image.
It's
not
gonna,
say
child
image,
because
it's
not
that
smart,
it
will
say
you
know
this
jar
was
added
to
the
registry.
Now
I
need
to
check
because
it's
not
going
to
show
up
in
our
manifest
promoter
manifests
like
text.
C
E
E
A
Sounds
good
so
about
the
new
cluster
yay
I
think
that
I'm,
currently
working
on
checking
how
things
are
going
with
monitoring.
We
have
have
some
consensus
at
cetera
because
ie
for
the
last,
like
a
few
weeks,
I
missed
many
things
and
but
about
the
turning
down.
All
this
you
have
shown.
Other
thing
is
going
him.
C
D
Their
next
steps
were
to
do
I.
Think
somebody
sent
me
a
PR
to
start
converting
the
kids
to
the
IO
stuff.
We
created
the
namespaces
that
one
actually
should
be
relatively
easy.
There's
not
a
lot
of
state
there,
but
I'll
have
to
manually
copy
the
old
secret,
so
we
can
make
a
clean
transition
and
there
is
a
list
of
the
other
ones.
We
filed
a
separate
issue
for
each
of
them
before
the
holiday
was
up
and
they
all
seemed
relatively
approachable.
At
this
point
there
are
back
stuff,
we
know
works.
D
We
still
don't
have
monitoring
an
alerting
but
I.
Guess
it's
not
really
fair
to
raise
the
bar
beyond
where
it
already
is
so
I
guess:
I'm,
not
standing
in
the
way
of
that
progress
anymore.
Somebody
else
was
also
working
on
the
promoter
method
or
the
publisher,
which
is
the
only
thing
left
running
in
that
other
dev
to
cluster
and
I'd
love
to
turn.
D
That
off
I
saw
this
yesterday
that
there's
a
bunch
of
PRS
that
were
opened
in
the
last
two
weeks
which
some
of
which
I
got
through
email,
some
of
which
I
didn't
see
my
tweets
about
github.
Changing
the
way
their
emails
work,
but
I
will
take
a
look
at
those
PRS.
I
probably
will
not
have
a
chance
to
until
next
week.
Obviously,.
A
A
D
Don't
know
what
that
means
either
I'm,
maybe
I'll,
use
that
as
a
jumping
off
point,
though,
I
ran
the
auditor
script
just
before
the
holidays
and
two
things
came.
A
few
things
came
to
mind.
One.
This
script
was
logging
a
whole
lot
of
stuff
that
was
really
useful
for
auditing
purposes,
so
I
sent
some
PRS
and
and
tidied
that
up
a
little
bit.
It's
certainly
not
done
second,
because
we
didn't
run
it
on
a
regular
basis.
It
had
a
ton
of
changes
that
were
very
difficult
to
audit
because
they
were
so
large
them.
D
So
I
did
a
pass.
I
hope
that
I
came
up
with
you
know.
I
didn't
find
anything
that
was
glaringly
obvious
if
people
want
to
contribute
this
is
it
looks
like
it's
relatively
sparse
day
today,
but
if,
if
there's
something
that
people
want
to
jump
in
and
help
on
the
auditing
script
producing
like
running
through
each
of
the
api's
and
auditing,
the
things
that
are
important
to
audit
for
that
API
producing
usable
diffa
below
would
be
an
awesome
place
for
people
to
contribute
I.
D
A
Think
that,
after
this
call,
I
will
spend
some
time
tomorrow
and
to
create
some
email
to
our
group
with
the
thing
what
is
happening
right
now,
etc.
So
I
will
mention
that,
stop
it
there
to
someone
somebody
who
can
help.
Okay,
there
is
no
topic.
There
are
like
open
discussion
topics
here,
and
there
is
some
topic
from
moosh
loop
about
new
accounts
for
image
builder
projects.
Can
you
tell
us
more.
B
Yeah
I'm
worship,
so
as
part
of
the
image
folder
project,
they
counted
two
distinct
requirements.
So
the
one
is
the
ability
to
add
entering
tests
for
building
images,
some
of
which
requires
nested
virtualization,
some
one
which
requires
different
accounts
and
we
can
probably
get
around
some
of
those
and
then
the
the
second
requirement
is
to
be
able
to
publish
those
images.
So
initially
this
would
be
I
think
Timothy
sent
Claire.
B
It
said
he
wanted
capi
had
an
image,
bold
ahead
to
kind
of
run
on
kubernetes
head
or
a
combinational
buzz,
so
that
you,
you
testing
the
tips
of
each
of
those
against
each
other
and
then
eventually
getting
to
a
stage
where
we
can
publish
images
to
the
community
and
take
some
of
that
workload
off
of
existing
sub
projects.
You
are
kind
of
duplicating
the
effort,
so.
B
So
using
an
existing
GCP,
so
we
want
named
accounts
for
publishing.
So
if
we're
using
AWS
or
TCP
accounts,
currently
we
would
get
a
pool
from
Bosco's.
If
that's
my,
if
I'm
understanding
it
correctly,
it
would
be
a
little
bit
harder
to
get
access
to
a
direct
account
and
to
limit
others
using
prowl.
You
gain
access
to
those
accounts,
so
that's
kind
of
why
we
can't
use
the
existing
accounts
and
why
we
would
use
it
to
have
new
accounts.
Can.
D
B
So
these
are
as
part
of
the
sick
trust
alive
cycle,
there's
an
image
builder
project.
So
at
the
moment
those
images
get
both
out-of-band
and
published
from
people's
desktops
and
works
there.
Yes,
so
so
we
want
to
add
entering
testing
of
those
images.
So
if
I
have
so
add,
pull
request,
testing
or
nightly
testing
so
that
the
images
produced
by
image
builder
actually
run
a
kubernetes
version.
B
D
B
And
we
can
only
build
them
by
spitting
up
games
as
well,
why?
Why
is
that
the
case?
So
so,
if
you
wanted
to
so,
for
example,
you
bully
images
using
nested
virtualization.
You
can
do
that
on
GCP,
but
you
actually
need
to
create
a
custom
image
template
which,
if
you're
being
spun
into
a
different
GCP
account
and
every
both
is
going
to
be
quite
slow,
so
do
want
to
create
one
image:
template
to
host
the
nest
and
virtualization
host
to
be
able
to
build
images,
that's
on
the
GCP
side
and
then
on
the
aw
aside.
B
So
it
is
there's
two
corner
ways
of
doing
it
is
that
way
which
is
the
Packer
model
and
then
there's
another
another
model
which
uses
humor
and
basically
downloads
a
cloud
image
spins
that
up
locally
using
nested,
virtualization
configures
it
and
then
creates
an
image
out
of
that
so
without
requiring
the
cloud
resources,
but
it
does
require
nested
virtualization,
so
there's
two
kind
of
different
competing
models
at
the
moment.
Okay,.
D
B
D
B
I
think
an
named
account,
so
one
account
provider,
don't
know
it
handles
the
adjure
ones,
but
we
can
probably
talk
to
them
directly,
but
just
a
named
AWS
account
in
the
name:
DCP
account
that
we
can
potentially
maybe
only
give
access
to
you
to
these
specific
jobs.
So
maybe
you're
on
some
trusted
Pro
cluster
or
something
like
that
and.
B
D
Was
so
the
the
pattern
that
we're,
following
from
the
other
staging
projects
with
respect
to
container
images?
Not
VM
images
is
that
we
have
a
project
for
GCP
calls
them
projects
Amazon
calls
Macallan's.
We
have
a
separate
GCP
project
for
each
staging
staging
effort,
the
staging
effort,
the
staging
excuse
me
words.
The
staging
project
comes
with
a
GC,
our
registry
and
a
GCS
storage
bucket
and
could
come
with
other
things
if
we
wanted
to
which
could
include
the
ability
to
run
VMs.
D
But
what
happens
today
is
we
then
link
it
via
prowl,
so
prowl
watches
your
source,
repo
changes
to
your
source
to
repo
trigger
prowl,
which
does
some
logging
stuff
and
then
triggers
cloud
build
in
your
staging
project.
Your
staging
project
does
whatever
your
cloud
build
needs
to
do
and
then
publishes
there.
The
images
to
that
staging
project
so
I,
don't
know
how
much
of
that
Maps
exactly
to
VM
images.
B
D
So
we'd
probably
want
to
do
something
very
akin
to
the
promoter
process
that
we're
working
on
for
images
Justin,
who
I
don't
believe
is
here,
wants
to
do
the
same
promoter
process
for
arbitrary
staging
bucket
artifacts.
So
maybe
that
covers
this,
the
idea
being
again
that
we
have
one
true
prod
or
some
number
of
true
prod
buckets
and
only
a
bot
touches
those
and
as
a
human,
you
file
a
yeah
mol,
pull
request
that
says.
D
D
D
D
Okay,
we
so
we
have
two
options:
one
we
can,
you
can
follow
both
branches
simultaneously
I
would
say,
goes
start
with
the
staging
Reba
O's
that
are
already
existing
for
the
cluster
API,
there's,
probably
a
half
dozen
of
them
for
the
different
providers
and
see
if
there's
enough
permissions
already,
my
guess
is:
there's
probably
not
and
you're
gonna
run
into
a
brick
wall.
That
says
you
know
you
need
access
to
the
compute
API
or
something
like
that,
and
then
we
can
talk
about
how
we
govern
that
the
other
side
of
it
would
be.
D
D
So,
okay,
then,
given
my
druthers
I
would
say
start
with
the
cluster
API
projects.
If
you
don't
have
access
to
those
already,
you
can
add
yourself
to
the
appropriate
groups
in
groups.
Diamo,
okay,
you
know
where
that
you
know
where
that
is
sorry,
I'm
being
very
vague,
like
you're
new
to
the
group
groups,
that
yamo
is
in
the
subdirectory
called
groups
under
the
Kate's
that
IO
repo,
okay,
I'll.
D
A
F
So
one
thing
that
still
is
something
that
is
like
a
preventative
thing
is
the
retention
rates.
So
we
have
a
retention
of
60
days
on
staging
storage
and
staging
GCP
surge,
which
was
sort
of
arbitrarily
done
and
there's
two
things
which
might
be
nice
to
like
dead
focus.
First
before
we
go
production
on
the
image
pomona
one
is
the
retention
for
staging
images.
Currently
we
don't
have
anything,
so
we
don't
clean
them
up,
so
just
basically
retention
forever.
F
So
we
might
not
want
to
do
that
because
it
might
lead
to
people
using
staging
images
instead
of
the
production
images
and
as
the
second
thing
is,
the
test
cluster
or
the
tests
basically
might
get
pushed
into
a
specific
search
bucket,
which
might
need
to
be
around
longer.
Basically,
all
the
Lots
from
the
tests
where
the
retention
of
sixty
days
was
sort
of
asked
to
be
lifted,
or
at
least
not
be
added.
That
makes
sense.
D
So,
let's
be
clear,
there's
a
difference
between.
Unfortunately,
the
words
are
very
confusing:
there's
differencing
retention,
which
means
the
minimum
amount
of
time.
You
must
keep
it
and
life
cycle
which
lets
you
configure
the
maximum
amount
of
time
before
I
automatically
delete
it
for
production.
We
set
the
retention
to
ten
years,
so
anything
that
gets
uploaded
to
production
will
stick
around
for
ten
years,
put
a
little
asterisk
on
that,
because
I'll
come
right
back
to
it
and
we
set
no
life
cycle.
D
We
don't
auto,
delete
anything
from
production
for
staging
I
agree
we
sort
of
want
to
delete
them
after
a
certain
amount
of
time.
Here's
going
back
to
that
asterisk
GCR
doesn't
support
life
cycle
for
images.
Yet
now
GCR
happens
to
be
built
on
GCSE
and
GCSE
does
support
lifecycle,
so
I
can
go
ahead
and
put
that
in
place
and
in
fact
it
kind
of
works.
It
deletes
the
image
blob,
but
it
doesn't
delete
the
image
header.
The
image
header
is
store
and
separately.
D
F
D
F
So
there
should
open
issue,
I
looked
it
in
there
in
the
docs.
It
also
mentions
that
you
can
do
that
on
there
on
the
Google
side,
basically
of
the
Google
cloud
side,
but
not
on
the
GC
outside
that
make
sense,
as
mentioned
this,
and
the
other
one
I'm
still
looking
up
is
on
their
test
locks.
If
you
want
to
keep
the
locks
longer
than
six
days,
something
was
mentioned
like
half
a
year
or
is
it?
D
Test
logs
are
these
past
great
ones
and
where
are
the?
Where
are
we
storing
those?
Do
we
have
a
separate
bucket
for
those
somewhere?
The
plan
is
to
have
a
separate
bucket.
I,
think
sure
we
I
would
consider
those
to
be
prod
logs
and
set
their
attention
to
ten
years.
There's
no
reason
to
delete
them
that
I
can
think
of
yeah.
F
F
D
Which
isn't
that
for
the
end-user,
so
I'm
happy
to
change
or
set
retention
for
staging
stuff?
If
we
think
it's
useful
I'm
happy
to
fiddle
with
numbers
and
move
it
from
30
days
to
6
months,
if
we
need
to
I
just
in
the
end,
I
want
as
much
automation
around
cleanup
as
possible
so
that
we
don't
end
up
with
a
bill,
we're
looking
at
going
well,
I,
don't
know
what
do
we
need?
These
are
okay,.
F
E
D
Yeah,
it's
it's
true
and
honestly,
keeping
things
forever
in
storage
is
not
a
huge
deal
other
than
it
gets
confusing
as
to
whether
people
can
rely
on
them,
be
there
forever
I
feel
like
somewhere
between
zero
and
one
year,
there's
a
threshold
which
effectively
becomes
permanent
right.
So
if
we
have
a
policy
that,
like
six
months,
these
things
go
away,
then
people
will
stop
relying
on
them,
but
if
we
keep
them
around
for
a
year,
we
might
as
well
keep
them
forever.
Yeah.
F
D
We
I
mean
two
stages:
six
months
is
two
releases
on
the
current
cadence
I'm
I'm.
Okay
with
that
I,
don't
think
that's
egregious.
It
feels
a
little
on
the
long
side,
but
if
we
think
that
dipping
between
two
major
releases
is
useful
and
that's
fine
honestly,
if
you
haven't
promoted
something
out
of
a
staging
repo
within
six
months,
you
probably
don't
need
it,
but.
F
What
like
my
Before,
we
jump
to
production
I
would
probably
advocate
for
setting
the
retention
as
lowest
as
possible,
because
increasing
is
always
better
than
breaking
someone
that
is
relying
on
the
said
retention.
Sure
so
I
would
go
with
30
or
60
days,
because
that
feels
like
in
a
normal
death
cycle.
Why
would
you
have
something
that
is
not
push
to
production
through
the
production
bucket
yet
or
to
the
production
VTR.
D
Yet
well,
and
just
let
me
be
devil's
advocate,
what
are
we
protecting
against
in
putting
a
life's
putting
a
retention
on
staging,
mostly
staging
is
operated
on
by
humans?
So
are
we
afraid
that
humans
are
gonna
come
through
and
do
something
dirty
and
then
try
to
clean
up
their
own
mess?
Or
so
so
we
had
one.
F
One
issue
with
docker
haps
still
having
images
which
weren't
updated
for
more
than
two
years,
including
the
pause
image
which
was
used
in
I,
think
VMware,
infrastructure,
testing
infrastructure
and
we
completely
broke
them
just
by
removing
a
pause
image
and
because
they
relied
on
them
being
there.
If
we
had
a
retention
rate,
at
least
of
the
expectation
of
30
days,
that
might
have
not
happened.
F
D
Don't
see
how
that
those
two
things
linked
like
just
to
draw
the
analogy
the
docker
hub
is:
it
was
a
prod
like
repo
that
somebody
was
assuming
was
still
alive
and
it
really
wasn't,
but
we
don't
have
a
good
way
to
advertise
that.
Nor
do
we
have
any
sort
of
retention
policy
on
it.
That's
not
a
staging
recall.
It
was
never
a
staging
staging
reposing,
explicitly
say
staging
or
staging,
or
something
like
that.
So,
if
you,
if
like,
if
you're
going
to
production
and
your
tests,
a
cait
staging
cluster
api
you're
wrong
in
the
head.
F
True,
but
it's
like
being
wrong
in
the
head
and
making
sure
that
expectations
are
set
to
different
things
like
I,
remembered
that
that's
some
Etsy,
do
you
like
service
internally,
was
to
hold
tolerance
that
everybody
was
using
it,
so
there
might
have
been
introduced
arrows
to
like
remove
that
huddle
or
to
expectation
that
it's
not
the
reliable
service.
Overall.
Sorry.
D
I
just
realized
I'm
Ari
fifteen
minutes
late
for
my
next
meeting,
I'm
happy
to
discuss
I'm,
not
as
convinced
as
I
might
be,
but
it's
not
also
a
huge
deal
with
the
caveat
that
I
can't
actually
do
anything
about
it
for
container
images.
So
the
example
that
you
used
is
kind
of
moot
anyway,
because
retention
doesn't
actually
work.
So,
let's
take
it
to
slack
or
something
yeah.