►
From YouTube: Kubernetes SIG Testing - 2020-10-20
Description
A
Hi
everybody
today
is
tuesday
october
20th
and
you
are
at
the
kubernetes
sig
testing
bi-weekly
meeting.
I
am
your
host
today,
aaron
kriegenberger,
also
known
as
spiffxp
on
github
slack
and
all
the
places
we're
all
going
to
adhere
to
the
kubernetes
code
of
conduct
during
this
meeting.
If
you
have
any
questions
or
concerns
with
my
behavior
or
the
behavior
of
others,
you're
welcome
to
reach
out
to
me
or
conduct
at
kubernetes
dot
io.
A
So
I
put
a
couple
things
to
just
kind
of
revisit
on
today's
agenda
based
on
our
discussion
last
week,
so
I
guess
I
can
share
my
screen
to
kick
us
off.
If
that's
cool
with
folks,
let's
see
cool,
I
think
you're
looking
at
the
meeting
notes.
A
So,
first
up,
I
wanted
to
talk
about
the
upcoming
docker
hub
changes
that
are
rolling
out
november
1st.
So
the
tldr
is
in
the
issue
description
here.
Basically,
as
of
november,
1st
docker
is
going
to
rate
limit,
pull
requests
of
image
images
from
docker
hub.
A
The
concern
here
is
that
this
rate
limit
applies
across
all
images.
So
if
we
focus
on
the
images
that
are
pulled
most
frequently
as
I'll
show
below,
I
think
we're
okay,
but
the
concern
may
be
that
the
long
tail
of
jobs
that
run
across
the
build
cluster
may
end
up,
causing
random
nodes
to
start
getting
rate
limited
by
their
ip
address.
A
My
my
quick
look
at
this
is
that
it's
gonna
be
a
lot
of
plumbing
to
do
so,
but
that
may
be
an
option
worth
investigating.
A
Another
idea,
folks
suggested,
was
implementing
some
kind
of
pull
through
cache,
which
again
we'd
have
to
hook
up
to
all
of
the
build
notes,
as
well
as
all
of
the
clusters.
Under
test
for
mitigation,
we
decided
we
should
sort
of
audit
all
of
our
tests
and
image
builds
and
see
just
how
bad
the
situation
is,
and
we
should
generally
advise
people
to
not
use
docker
hub
and
instead
use
gates.gcr.io,
which
is
the
image
registry
that
or
repository
that
the
project
provides
for
everybody.
A
So
antonio
rightly
honed
in
on
all
of
the
images
used
under
kubernetes,
which
works
out
to
these.
I
sort
of
found
the
same
thing
by
going
and
looking
at
the
kubelet
logs
from
a
cluster
under
test
and
saw
that
these
were
all
of
at
least
for
the
default
job.
These
were
all
of
the
images
that
were
pulled
from
docker
on
a
given
cluster
under
test.
A
And
then
I
tried
to
survey
the
default
build
cluster
within
google.com,
but
did
not
have
the
appropriate
credentials
or
time
to
survey
that
so
then
we
started
look
so
then
I
sort
of
surveyed
all
of
our
job
configs,
to
see
what
are
all
of
the
images
that
are
used
that
are
not
hosted
by
gcr
to
io.
You
can
see
a
lot
of
golang
and
node
and
python,
and
alpine
and
stuff.
A
It's
not
something
we
need
to
explicitly
specify
in
our
job
configs
we're
just
going
to
get
that
for
free,
and
it
includes
pretty
much
all
of
the
images
that
antonio
laid
out
here,
including
alpine
and
nginx,
and
pearl
and
redis,
which
I
think
maybe
are
now
kind
of
verified
later
down
here
in
the
comment
stream
so
cloud,
you
raised
an
interesting
point
that
mir.gcr.io
may
not
necessarily
mirror
multi-arch
requests,
and
he
also
reminded
us
that
there's
there
are
dedicated
plans
for
open
source
projects
which
I
had
started
investigating
right
after
last
meeting
and
I
needed
to
gather
some
more
data
on
exactly
how
many
images
we
pulled
where
and
why.
A
B
Is
there
somebody
here
who
knows
a
bit
more
about
the
multi-arch
use
case,
because
it
that
part
I
didn't
quite
get
like
it
seems
to
me
that
for
the
common
things
like
mirror.gcr.io
would
work,
and
so
I'm
not
sure
why
for
a
multi-arch
solution,
it
wouldn't.
C
C
From
what
I've
checked,
there
aren't
a
lot
of
jobs
which
are
currently
using
the
other
architecture
types.
So
for
most
scenarios
we
should
be
fine,
but
I
think
there's
still
too
many
of
them,
which
will
require
some
solution
for
that.
B
Yeah,
I
think
it
makes
sense
that
by
multi-arch
you
mean
different
architectures.
That
makes
a
lot
of
sense
now.
Thank
you.
A
Yeah
sorry
about
that,
I
used
some
a
buzzword
there.
Please
do
call
me
out
when
I
do
that.
C
I've
actually
listed
here
the
jobs
that
are
using
other
architecture
types
most
of
them
are.
There
are
three
three
periodic
jobs:
a
couple
of
pre-submits
and
a
couple
of
node
jobs
that
I'm
not
sure
if
they're
periodic
or
pre-submits,
or
something
like
that.
I.
D
Think
most
of
the
stuff-
that's
not
amd
64
is
not
even
on
like
rci
at
all.
I
think
we
should
also
step
back
and
acknowledge
that
mirror.gcrdio
is
like
a
mitigation.
It
may
help
us
avoid
pain,
but
it's
not.
It's
certainly
not
a
solution,
because
even
if
we
didn't
have
the
multi-arch
problem,
it
may
not
necessarily
mirror
the
images
we
use.
C
C
E
D
An
additional
cache
that
happens
to
be
fast,
that
we
can
very
trivially
roll
out
to
ci
and
help
mitigate
some
of
what's
going
on
our
ci.
It
doesn't
help,
for
example,
end
users
that
need
to
pull
it
on
in
their
own
environments.
I
think
we
still
ultimately
either
need
to
get
our
docker
hub,
such
that
it's
not
re-limited,
if
that's
possible
or
any
of
the
ones
we
use,
or
you
know,
move
off
of
it.
C
That's
a
good
question
from
docker:
we
only
care
about
five
images
which
are
most
commonly
used
in
conformance
tests
and
those
are
in
the
docker
hub
library.
As
I
mentioned,
those
are
the
busy
box
image,
two
versions
of
httpd
and
two
versions
of
nginx.
C
C
So
I
was
wondering
if
I,
if
I
could
make
it
use
nginx
images
instead,
so
that
would
be
get
from
five
to
three
images.
D
My
concern
is
that
unless
I
don't
know
that
anybody
has
confirmed
this
to
me
yet,
but
unless
I
read
along
my
initial
impression
was
that
the
rate
limiting
is
per
client.
So
if
we
get
an
account,
that's
not
rate
limited
that
could
be
used
by
like
ci,
but
it
doesn't
solve
the
problem
that,
like
we,
have
tools
that
are
built
around
rate.
Limited
images
so
then,
like
all
of
our
users,
will
need
to
do
the
same
thing.
D
And
also
more
specifically,
someone
from
docker
commented
on
a
thread
a
while
back
and
mentioned
contacting
the
cncf
related
to
that
sort
of
account.
I
don't
know
that.
A
A
So
my
my
concern,
just
personally
speaking,
is
I
want
to
make
sure
that
all
of
our
pr
traffic
against
kubernetes
kubernetes
does
not
grind
to
a
halt
on
november
1st,
so
I'm
principally
concerned
with
making
sure
that
all
of
the
ci
jobs
that
are
merged
blocking
and
release
blocking
for
kubernetes
kubernetes
work,
then
expanding
out
from
there.
We
can
talk
about
ci
jobs
that
involve
kubernetes,
but
other
me
are,
you
know,
live
on
other
boards,
so
here
I'm
talking
about
cluster
api
jobs,
I'm
talking
about
cops
jobs.
A
You
know
alternate
the
aks
engine,
jobs
alternate
means
of
provisioning.
A
cluster
may
require
alternate
means
of
plumbing
the
credentials
through
expanding
out
from
there.
A
You
can
talk
about
ci
for
all
of
our
100
and
something
sub-projects,
and
there
I
feel
like
we,
we
may
have
most
of
the
the
common
cases
captured
and
we
should
advise
people
on
how
to
move
to
gcr
to
I
o
or
something
else,
and
just
let
them
know
that
you
know
they
may
they
may
be
experiencing
some
bumps
in
the
interim
and
then,
and
only
then
do
I
personally
start
caring
about
the
experience
of
our
developers
and
making
sure
that
we
don't
have
anything,
that's
causing
a
developer's
machine
to
get
rate
limited,
because
I
feel
like
developers,
you
know
they
they
can
authenticate
and
at
least
get
the
200
per
6
hours
instead
of
100
per
6
hours.
A
So
that's
something
other
concerns
I
have
about
stuff
running
in
ci,
which
I
tried
to
list
here
in
the
notes.
Ben.
Maybe
you
want
to
articulate
some
of
this.
You
know.
A
lot
of
the
things
I
believe
we
have
covered
are
all
of
the
jobs
that
end
up
using
cluster
cubeup.sh,
but
not
all
of
those
jobs.
So
there
are
other
jobs
that
use
docker
and
docker.
D
So
docker
and
docker
should
be
trivial.
If
we
have,
we
haven't
done
it
already.
I
think
I
actually
did
already.
I
just
need
to
confirm
before
this
comes
up
to
enable
mirror.js
it
it's
a
really
simple
configuration
change,
and
we
can
we
control
that
for
kind,
it's
a
bit
of
a
layering
violation
to
be
using
that
because
it
doesn't
necessarily
make
sense
for
all
users
and
it's
not
clear
how
it
would
inherit
that
from
the
host
we
may
have
to
think
of
it
about
how
we
do
that.
D
I
I
think
for
the
ewe
test.
That's
probably
the
main
thing
so,
like
all
of
the
all
of
the
image
builds
and
whatnot
for
core
kubernetes,
don't
use
all
of
the
things
we
run
in
pre-submitter
post-emit,
don't
use
it,
though
the
only
things
that
do
are
like
say
the
e-image
or
something
which
is
built
much
more
infrequently,
but
we
have
the
images
that
we
use
for
e2e
pods.
That
claudio
was
mentioning
for
those.
I
think
we
probably
want
to
get
those
onto
kates.gcr.o.
D
So
another
thing
that
hasn't
come
up
yet
is
we
might
look
at
extending
the
image
promoter
system
to
allow
promoting
images
that
we
don't
control
into
some
registry
like
at
least
a
staging
one,
or
something
something
along
those
lines
that
we
can
use
for
things
like
ud,
so
that
we
don't
actually
have
to
host
our
own
sourced
nginx
image.
D
C
So
similar
to
what
I
suggested
last
week
for
the
busy
box
image
right.
D
Yeah,
I
think
so
I
apologize
I'm
drowning.
A
In
stop
sharing
real
quick
just
so
we
can
see
faces
while
we're
talking.
So
my
impression
is
the
image
promoter
right
now
doesn't
support
promoting
from
arbitrary
registries.
I
think
it
only
supports
promoting
from
gcr.io.
A
I
would
see
nothing
wrong
with
setting
up
jobs
that
are
just
you
know
that
have
a
very
basic
docker
file.
That's
from
the
docker
hub,
hosted
image
and
ends
up
landing
in
a
staging
repo.
My
thought
would
be
the
e2e
test
images
staging
repo
that
we're
already
using
for
things
like
again
host
and
all
of
the
other
images
that
are
used
by
our
e2e
tests.
D
Fair
suggestion,
I
actually
think
it
would
be
relatively
straightforward
to
just
I
mean,
even
without
a
knocker
and
without
a
docker
image,
we
could
just
write
some
like
cloud
build
jobs
that
just
like
push
to
staging
like
pull
push,
pull
tag,
push
or
something
along
those
lines,
or
even
someone
that
has
credentials
could
do
it
manually.
I
think
we
allow
manual
pushes
to
the
staging
registries
and
then
promote
those
or
or
serve
out
of
staging.
D
If
just
as
possibly
an
option,
if
we
have
some
concerns
about
like,
why
is
case.gcr
and
io
hosting
like
busybox
or
something
it
might
make
sense
to
just
leave
it
in
the
uw
staging
images,
which
are
a
little
bit
less
visible.
A
Okay,
so
it
sounds
like
one
concrete
task
that
I
will
own
is
reaching
out
to
dockerhub
and
the
cncf
to
look
at
either
an
open
source
set
of
accounts
or
paid
accounts.
D
All
right,
I
think,
I
saw
a
hand
from
ben
yeah.
I
I
probably
know
that
best
I'd
be
curious
to
know
up
front.
If
there's
any
possibility
of
like
setting
up
registries
that
aren't
limited
on
that
end
or
if
it
does
have
to
be
the
client.
If
it
has
to
be
the
client,
then
yeah
we'll
need
to
plug
that
through
based.
D
That
that's
my
rate
as
well,
but
if
we
do
wind
up
talking
to
docker
hub
I'd
like
to
confirm
that,
because
that's
that
is
not
as
nice
for
users
versus
something
like
gcr,
where
we
can
just
pay
on
the
hosting
site
and
not
make
all
of
our
users
worry
about
it.
Right.
A
C
D
It
if
we
drop
that,
can
you
send
me
like
a
ping
periodically
on
slack
or
something
github.
Notifications
have
been
a
bit
of
a
tire
fire
for
me.
C
D
I
think
most
projects
move
to
alternative
free
hosts
like
or
possibly
doc
or
github's
package
registry.
I
think
we
should
look
at
those
for
summer
usage,
but
we
may
not
be
as
inclined
to
migrate
to
that
versus,
say
gcr,
which
we
kind
of
already
have
tooling
around
and
whatnot.
A
Okay
and
then
I
feel,
like
the
other,
the
only
open
question
I
have
is:
what
do
we
need
to
do
for
kind.
D
Yeah
that
one's
been
a
little
bit
more
of
a
single
point,
I
would
say
for
most
of
our
upstream
stuff,
like
the
ew
images,
that
it
makes
sense
to
host
them
alongside
other
things
in
case
of
gcro.
If
we
need
for
kind,
we
risk
causing
issues
with
our
chinese
contributor
and
user
base,
which
so
far
have
been
able
to
rely
on
being
able
to
create
a
kubernetes
cluster
by
pulling
one
image
from
docker
hub,
which
is
actually
available
behind
the
great
firewall.
D
So
just
moving
to
case.gcro
is
not
a
great
one
there.
The
other
thing
is
we
actually
have
been
using
mutable
tags
for
the
moment,
which
is
not
something
the
image
promoter
allows.
D
Bit
tbd
on
what
we
should
do
for
that
one
and
then
there's
also
like.
If
we
were
going
to
rely
on
mere.gcr.io,
we
may
need
to
look
at
in
what
way
would
it
make
sense
to
plumb
that
through
nci
or
any
special,
like
docker
poll
credentials
or
whatever,
which
should
both
be
doable?
I'm
just
not
sure
how
we
want
to
do
it.
D
I
think
the
next
thing
I
want
to
do
is
investigate
the
github
registry,
because
I
think
that's
also
a
potentially
attractive
option
for
our
smaller
projects
that
don't
necessarily
need
to
be
like
hosted
in
case
I
gcr
dio,
or
have
all
the
promotion
and
whatnot.
There
are
some
projects
that
are,
you
know
using
get
of
actions.
It
may
be
nice
if
they
can
just
do
that,
have
all
the
credentials
handled
for
them
and
no
fuss.
D
I've
heard
it's
supposed
to
be
available,
but
I
need
to
confirm
more
about
how
well
it
works.
I'm
not
sure
if
that's
an
option
we
should
move
forward
on,
but
I
want
to
investigate
it.
A
That
sounds
worthwhile.
My
knowledge
is
probably
a
couple
months
outdated,
but
I
feel
like
reli.
The
release
engineering
team
was
messing
around
with
this
for
a
little
while,
and
they
found
that
github's
package
registry
was
too
flaky
when
it
came
to
pulling
artifacts
reliably.
D
Interesting
well,
I
also
believe
they've
had
a
major
rewrite
which
is
part
of
what
I've
waited
they
had.
You
can
pull
by
digest,
which
was
a
non-starter
for
both
my
own
concerns
around
petting
images
and
container
d,
which
always
derefs
the
digest
and
pulse
by
digest.
D
A
Yet,
okay,
I
kind
of
want
to
move
us
forward
in
the
agenda.
Do
we
have
any
other
open
questions
on
docker.
A
Hub,
okay,
this.
So
this
brings
me
to
my
next
open
question.
I
sent
out
an
email
thread
about
this
a
little
while
ago.
We
are
gonna,
not
have
our
regularly
scheduled
meeting
on
november
3rd,
just
because
that
coincides
with
the
u.s
election.
A
A
number
of
other
sikhs
have
also
sort
of
canceled
order
for
their
meetings.
When
I
started
looking
at
scheduling
the
next
meeting
after
that,
as
regularly
scheduled
falls
during
kubecon,
and
I
imagine,
people
might
be
elsewhere
or
otherwise
occupied.
A
So
my
question
to
folks
is:
we
could
have
no
meetings
in
november.
Her
next
regularly
scheduled
meeting
would
fall.
On
december
1st,
my
thought
was:
given
everything
we've
just
discussed
about
docker
hub.
A
It
might
be
worthwhile
to
have
a
meeting
on
november
10th,
just
to
kind
of
see
where
we're
at
as
far
as
mitigating
the
docker
hub
stuff
and
the
other
option
is,
if
we're
shifting
one
meeting
off
by
a
week,
we
have
another
meeting,
two
weeks
from
then
november.
24Th
do
people
have
any
strong
opinions
on
how
many
meetings
we
should
have
in
the
month
of
november
and
when
we
should
have
them.
D
I
do
think
we
should
at
least
have
one,
and
I
actually
think
it's
a
good
idea
to
move
them
around
these
events,
which
we're
otherwise
going
to
have
understandable
attendance
issues
with
I
don't
know
if
we
need
all
of
them.
I
think
the
end
of
the
year
is
usually
a
slow,
slow
time
anyhow,
but
we
actually
seem
to
have
a
number
of
things
to
discuss
at
the
moment.
D
In
particular,
I
think
docker
hubs
issues
roll
out
between
now
and
the
next
possible.
One
it'd
probably
be
good
to
have
a
meeting
after
that
to
just
circle
back
with
everyone
on
this.
B
A
A
So
all
right,
I
will
reschedule
the
november
3rd
meeting
to
november
10th
and
delete
the
other
meeting.
Okay
next
thing
on
the
agenda
I'll
share.
My
screen
for
this
again
is
just
to
kind
of
check
back
in
on
where
we
are
at
with
ci
policy
updates.
A
A
So
I
tried
to
lay
out
what
next
steps
are
supposed
to
be
for
all
the
build
jobs.
To
recap,
there
are
two
google
cloud
buckets
kubernetes
release,
dev
and
kubernetes
release
poll,
which.
A
So
I
proposed
the
replacement,
buckets,
kate's,
release,
dev
and
kate's
release
poll
and
what
we
need
to
do
now
are
create
jobs
that
write
to
those
new
buckets
and
then
start
changing
over
jobs
to
consume
the
builds
that
are
placed
in
those
buckets,
and
so
I
think
carlos
from
the
release.
Engineering
team
created
a
canary
job
for
the
build
fast
job
and
arno
has
opened
up
a
pr
for
the
build
job
just
for
release
blocking
right,
and
so
now
the
next
step
is
to
have
jobs
consume
those
artifacts.
A
So
once
I
I
was
curious.
What
the
hold
up
was
on
implementing
that
and
when
I
went
looking,
I
discovered
that
cubetest
had
a
bunch
of
hard-coded
locations
in
it.
So
I
added
this
pull
request
to
add
flags
to
tell
cubetest
where
to
extract
its
release.
Artifacts
from
and
my
plan
was
to
change
over
a
job
this
afternoon,
and
if
that
looks
good,
I
think
we
should
change
all
the
other
jobs
over.
A
The
complication
with
the
build
job
is
that
we
discovered
it
doesn't
just
write
its
artifacts
to
a
google
cloud
bucket.
It
also
has
this
flag:
it
publishes
images
to
a
gcr
repo
called
kubernetes,
ci
images.
I
have
no
idea
what
what
project
owns
that
repo.
So
it's
unclear
to
me
whether
we
would
be
allowed
to
continue
to
write
to
that
or
if
we
would
need
to
create
a
new
staging
repo
to
write
to.
A
Great
at
the
time
I
tried
to
go,
look
to
see
like
what
are
these
images
and
what
are
they
used
by
code?
Search
was
down
at
the
time,
but
I
I
think
I
I
grepped
around
in
the
kubernetes
code
base
and
it
looks
like
kube
adm
makes
some
references
to
it,
so
I'm
assuming
that
means
most
of
the
cluster
api
providers,
probably
have
that
repo
hard-coded
somewhere
in
them
and
they
may
need
to
be.
A
Assuming
we
take
care
of
that,
I
think
we
are
good,
like
I
think
we'll
have
then
this
is
just
to
be
clear.
This
is
something
that,
like
does
not
require
any
google.com
access
to
do.
It
just
requires
somebody
to
make
the
changes
to
the
jobs
and
maybe
send
them
more
or
less
and
make
sure
that
they
look
okay.
A
So
it
is
my
hope
that
this
is
something
that
release
engineering
team.
Maybe
the
ci
signal
team
could
help
out
with,
and
I
think
we'd
be
able
to
finally
close
out
the
big
three
from
our
original
policy
issue,
which
I'm
trying
to
find
right
now:
okay,
so
I'm
just
trying
to
migrate
back
up
to
the
umbrella
issue.
A
A
Rob
I
hate
to
put
you
on
the
spot,
but
I
feel
like
this
is
something
that
had
been
discussed
by
the
ci
signal
folks
over
in
siege
release.
Do
you
have
any
updates
on
that.
G
A
Yes,
I
I
had
sorry
so
if
we,
if
we
finish
out
the
build
jobs,
we'll
have
finished
out
the
top
three
right
and
so
then
the
other
things
that
could
happen
right
now
too,
in
fact,
are
mandating
that
all
the
proud
jobs
have
contact
info.
So
this
issue,
we
both
had
to
look
at
this
yet
yeah
and
then
sort
of
removing
any
jobs
that
are
just
egregiously
failing
forever,
like
that.
Maybe
is
the
signal
that
nobody's
actually
maintaining
them
or
watching
them.
A
So
that's
sort
of
like
this
is
the
the
band-aid
or
the
stop
gap,
and
then
how
would
we
carry
this
policy
forward?
So
how
would
we
declare
look
if
your
job
is
failing
for
more
than
n
days
or
and
weeks,
we're
going
to
remove
it
so
figuring
out
what
that
policy
should
be
crafting
it
and
rolling
it
out.
G
I
suppose,
from
from
our
point
of
view,
like
I
agree
with
the
policy-
and
I
just
I
suppose,
I'd
be
concerned
about
doing
that
in
such
a
way
that
that
we
explain
the
policy
to
the
to
the
to
the
community
and,
I
suppose,
there's
yeah,
there's
multiple,
there's
multiple
ways
of
going
about
this.
I
think
I'd
have
to
take
guidance
on
what
the
best
way
to
publish
the
policy
will
be
and
communicate
that,
but
I
presume
just
sending
out
your
formulation
policy
here,
which
I
think
is
reasonably
straightforward.
G
We
can
put
some
numbers
on
how
long
a
job
takes
to
continuously
fail
whereby
we
go
okay
right.
This
is
suspect
and
publish
yeah,
publish
a
policy
and
then
just
announce
it
and
then
implement
it.
G
Is
there
anything
here?
That's
really
fundamentally
hard.
E
A
Recap:
we've
got
this,
this
query
is
run
daily
and
it
shows
which
jobs
have
been
failing
the
longest,
so
we
probably
have
a
couple.
These
would
be
probably
the
egregiously
failing
jobs
that
have
been
failing
for
coming
up
on
900
days
for
some
of
these,
and
we
could
probably
have
something
that
just
looks
at
this
periodically
and
could
generate
a
report
or
send
an
email
out
or
something
in
terms
of
order
of
operations.
A
So
generally
we
blast
stuff
out
to
kubernetes
dev
when
we're
announcing
policies.
Looking
for
comments
on
them,
you
know
allowing
enough
time
for
it
for
lazy
consensus,
but
another
thing
that
could
be
helpful
to
do
here
is
like
to
help
figure
out
who
owns
these
jobs.
A
We
could
maybe
establish
who
owns
the
jobs
and
then
not
just
send
a
kubernetes
dev,
but
also
send
to
those
sigs
or
those
people
specifically,
as
maybe
a
way
to
to
raise
signal.
So
it
could
be
that
we
want
to
finish
this
issue.
First
yeah
and
my
my
rough
guess
and
how
I
would
assign
ownership
would
be
to
just
take
a
look
at
where
the
jobs
are
in
test
grid.
Right
now,.
A
Whoops,
that's
not
going
to
work
sort
of
when
we
were
working
on.
You
know
measuring
how
how
we
were
going
on
csi
policy
or
sorry.
Ci
policy
had
created
the
the
proud
job
report
tool
and
test
infra
under
the
experiments
directory,
which
outputs
a
bunch
of
outputs,
a
bunch
of
stuff
to
csv,
and
then
I
imported
that
into
google
sheets,
and
so
that
report
lists
all
of
the
dashboards
that
a
job
work.
A
It
takes
a
good
guess
at
which
dashboard
that
job
should
belong
to
so
for
cases
where
jobs
have
multiple
dashboards
assigned
to
them.
The
most
important
dashboard
is
whether
or
not
it's
on
cigarette
master
blocking
or
not
right,
but
then
secondarily,
like
which
other
sig
name
or
work
group
name,
does
that
job
belong
to
so
like
the
the
evil.
A
Noisy
approach
we
could
take
is
just
assume
that
if
it's
on
your
dashboard,
you
own
it,
and
then
we
put
your
sig's
mailing
list
as
the
alert
address,
and
then
we
say
you're
going
to
get
alerts
for
these
jobs.
Do
you
want
to
continue
to
get
alerts
for
them?
A
G
Let
me
take
a
look
at
this.
I
I
I
backed
away
from
this
for
from
a
time
to
to
break
the
back
in
the
report,
and
I'm
and
this
week
I'm
hopefully
going
to
finish
out
the
the
status
report
and
and
I'll
get
stuck
back
into
this
and
and
and
figure
some
things
out.
A
Sure
yeah,
my
thought
is
this
could
be.
This
could
be.
Oh
sorry,
I
see
I
see
stuff
in
chat
too
are
now,
were
you
saying
you
wanted
to
work
on
this
or
something
else?
I
missed
that.
I.
A
Okay,
but
like
this,
I
think
ci
signal
has
a
bunch
of
shadows
too,
so
it
could
be
a
great
thing
to
point
out
that.
D
G
Lot
of
subs
on
the
bench,
as
I
said
before,
yeah
yeah,
we
could
perhaps
talk
about
this
offline.
If
that
was
okay
and
then
yeah
and
we'll
yeah
I'll
raise
an
army
to
to
work
through
this.
I
think.
A
Okay,
any
other
questions,
comments
or
concerns
on
the
ci
policy
stuff.
A
All
right,
I
super
appreciate
everybody's
help
in
rolling
this
out.
I
think
it's
it's
pretty
clear.
Ci
has
gotten
a
lot
better
in
the
project
since
then.
I
know
I
locally
have
this
grafana
instance
that
I'm
running
eventually
one
day,
maybe
I'll
figure
out
how
to
actually
get
it
up
in
the
cloud
and
make
sure
the
credentials
aren't
broken.
A
A
H
The
release
blocking
job,
we
originally
ran
that
out
dot
cncf.io
and
it
was
able
to
function
agenda,
but
I
think
the
first
two
links
there
were
where
it
succeeds
on
our
pro
and
it
fails
on
the
kubernetes
pro
and
we
tried
three
different
variations
on
it,
just
to
make
sure
but
there's
something
slightly
different.
I
suspect
it's
in
the
sidecar,
maybe
the
I
think
there
was
there's
like
decorators
and
we
weren't
able
to
get
decorators
working
on
our
proud
config.
H
Yet
and
one
of
the
decorator
things
is
the
entry
point
so
possibly
there's
some
different
behavior.
On
the
entry
point,
I
think
I
also
included
a
link
to
our
docker
image
or
the
code
underlying
it.
H
A
H
It
seems
to
work
on
all
variations
on
our
side
and
it's
just
been
difficult
to
understand,
what's
happening
when
it's
running
the.
D
Go
ahead:
are
you
saying
that
you
are
running
it
without
the
pod
utilities,
in
your
cluster
and
you're
running
it
with
the
pod
utilities
in
our
cluster?
That's
correct,
yeah.
So
I
would
guess,
then,
that
it
is
so
the
entry
point
mechanism
in
a
docker
file
that
says
run
this
as
the
entry
point
and
then
you
can
have
like
arguments
to
it
dynamically.
D
A
Like
I
guarantee
what
what
this
output
is
here
is
the
the
entry
point
container
provided
by
pod
utils,
so
first
off
the
purpose
of
the
entry
point
container
is
to
be
able
to
automatically
wrap
the
command
of
the
test
container,
that
you're
you're
providing
and
then
it
will
take
standard
output
from
that
and
dump
it
to
the
build
log
which
gets
uploaded
to
gcs,
and
it's
also
responsible
for
automatically
uploading
anything
that
lands
in
the
artifacts
directory
to
gcs.
A
So
if
we
don't
use
the
entry
point
container,
you
don't
get
any
logs.
The
entry
point
container
almost
undoubtedly
runs
his
route.
So
I
think
what
ben-
and
I
are
trying
to
say,
is
one
suggestion
is
to
on
your
cncf
prowl.
Try
using
pod
utils
that
way
you
could
debug
it
there,
where
you
have
maybe
more
access
to
the
the
nodes
and
the
cluster,
and
you
could
maybe
have
the
the
command
that
you
use
for
the
container.
A
Be
this
like
be
the
script
that
tries
switching
to
the
postgres
user
before
it
runs
stuff.
What
I
this
is
my
lack
of
kubernetes
knowledge,
showing
what
I
don't
know
is
whether,
like
a
security
context,
is
going
to
yell
at
you.
D
For
doing
this,
I
think
I
don't
think
we
have
jobs
doing
currently,
but
you
also
probably
can
just
set
the
user
id
on
the
container
in
the
pod
spec
in
the
proud
job.
You'll
just
need
to
find
out
what
uid
that
is.
A
H
H
Also
that
our
entry
point
script,
if
you
want
to
bring
that
up,
there's
a
point
where
it
compares
the
arguments
provided
and
it's
right
there
in
the
same
info,
the
arguments
provided
and
the
uuid
to
decide
whether
to
set
up
postgres
and
that's
probably,
where
we're
running
into
this
difference
in
arguments
so
I'll
I'll
well,
so
to
close
things
up
real
quick,
we'll,
try
and
get
potty
pills
up
and
running,
we'll
see
if
we
can't
modify
the
args
to
to
just
have
a
simple
entry
point
and
likely
just
run
his
route,
because
this
does
allow
us
when
we
run
his
route
to
chong
and
then,
if
our
first
argument
on
line
51
is
postgres,
it
will
do
the
emit
db
that
we
were
currently
lacking.
H
So
either
we
weren't
running
his
route
or
either
the
db
wasn't
getting
set
up
and
we
need
probably
both
those
if
statements
to
trigger,
including
the
one
on
30,
looks
like
line
32..
Okay,
thank
you
for
that
help.
H
The
one
other
issue
that
we
have
is
related
did
it
get
on
there?
Yes,
advanced.
H
A
H
It
is
so
much
fun
in
order
for
us
to
to
correctly
measure
the
amount
of
conformance
coverage
that
we
have.
We
need
to
have
the
audit
policy
not
remove
stuff
for
us,
however,
the
the
recommended,
or
at
least
by
it's,
not
documented
anywhere,
but
there
is
a
variable
that
says.
Please
set
this
to
override
the
audit
policy
and
it's
all
deep
in
that
ticket.
I
have
gone
through
and
run
that
manually
like
this
script
and
and
generated
the
audit
policy.
H
It
is
a
valid
audit
policy,
but
when
you
look
at
the
logs
it
says
this
is
a
it's
missing
values
or
invalid
yaml
and
I've
not
also
in
another
on
our
proud
cluster.
I
have
not
been
able
to
get
up
or
running
with
pr
kind
run,
proud
jobs
on
kind.
The
ability
to
run
google
drops
like
this.
In
order
for
me
to
be
able
to
debug
it
plus
the
scripts
are
pretty
it's
hairy.
D
H
H
Let's
modify
the
kind
conformance
job,
so
we
don't
have
to
run
it
in
google
and
I
can
run
it
locally,
but
we
need
to
modify
and
have
an
audit
policy
so
that
it
logs
and
creates
the
audit
log
in
the
same
way
that
the
other,
the
other
jobs
do
upstream,
because
we
consume
that
audit
log
and
bring
it
into
api
snoop
db
to
see
the
user
agents
that
they're
passed
the
api
server
to
calculate
coverage.
H
H
D
I
mean
finding
out
why
it's
not
valid
seems
like
something
we
ought
to
be
able
to
do.
I
don't
know
if
that's
the
correct
password,
though
I'd
probably
want
someone
else's
opinion
on
which
jobs
we
should
use,
as,
given
my
inherent
bias.
A
Let's
see
kind
of
has
his
conformance,
so
we
could
use
kind.
It's
valid.
What's
what's
like
what's
happening
here
is
what
what
you
do
for
the
next
step
of
debugging.
This
job
is,
I
would
recommend
you
use
cluster
cube
dot
sh
or
you
could
you
could
even
run
this
on
your?
A
You
can
run
this
on
your
pro
instance,
where
you're
gonna
have
you
know
more
access
to
ssh
to
things
you
probably
set
it
up.
So
instead
of
automatically
tearing
the
cluster
down,
you
could
modify
the
job,
so
it
just
leaves
the
cluster
up
and
then
you
could
go
ssh
to
the
node
in
question
and
see
for
yourself
what
the
yaml
actually
is.
H
On
that
node
I
I
sort
of
did
that.
I
ran
this
job,
but
I
I
set
the
entry
point
to
sleep
and
the
arcs
to
infinity
and
ran
the
generation
of
the
ammo
file
and
pulled
that
out
and
used
kind,
and
the
policy
was
fine,
so
it
right.
If
something
else
is
happening
again,
we
probably
need
to
figure
out
how
to
get
pod
utils
up
in
such
a
way
in
prague,
which
is
running
on
aws
to
start
up
the
google
credentials
correctly,
because
without
those
pod
utils
we're
not
going
to
get
the
are.
A
Yeah,
so
if
this
is
just
about
capturing
api
audit
logs,
you
can
always
do
that
with
kind.
That's
true,
I
know
what
api
sleep
has
been
doing
in
the
past
is
looking
at,
like
the
default
jobs
on
logs
versus
the
conformance
drops,
so
you
get
coverage
of
what
has
been
tested
versus
what
is
tested
with
conformance
tests
and
the
reason
we
said.
No,
don't
turn
on
don't
put.
Events
in
the
audit
logs
is
because
kubernetes
generates
a
lot
of
events
and
the
audit
logs
gets
huge
in
their
performance
implications.
A
H
I
have
a
a
link:
it's
actually
on
cncf
api
snoop,
the
repo
there's
a
kind
folder,
and
that
has
the
kind
config
and
the
the
audit
policies.
If
we
could
just
find
a
way
to
update
the
existing
conformance
job
for
time
to
do
this.
And
we
just
need
to
figure
out
where
the
odd
the
logs
go,
because
I'm
not
sure
but.
G
A
Yeah
thanks
for
that,
I
so
I
think
I
would
say,
look
at
using
kind
for
this,
that
that
seems
like
a
faster
approach.
Waiting
into
debugging
cluster
kubo
seems,
like
the
other.
If
you
want
to
continue
using
clip.
Okay
andrew
you
asked
to
create
a
new
repo
called
e2e
framework
in
kubernetes.
Six,
I'm
not
gonna
lie
the
only
reason
I
haven't
slapped
my
plus
one
on
this
is
because
I'm
wondering
if
we
should
use
the
existing
repo
we
have
in
kubernetes
6
called
testing
frameworks.
A
Which,
apparently,
is
retired,
never
mind
so
yeah
we
should
probably
create
a
new
repo.
Do
you
want
to
speak
to
it
a
little
more?
I
know
we've
sort
of
we've
chatted
about
this
in
in
group
chats,
but
I
don't
know
if
it's
chatted
to
more
people.
That's.
J
J
Yeah,
okay,
so
to
just
add
more
context,
there
was
some
work
that
was
being
done
by
alejandro
and
tim
and
I
sometime
last
year
and
the
main
motivation
from
from
my
end.
So
so
the
work
we're
trying
to
do
was
we're
trying
to
take
the
internal
end-to-end
testing
framework
in
kubernetes
kubernetes
and
move
it
to
a
place
where
it's
more
easily
consumable
from
for
cloud
providers,
csi
drivers
and
kind
of
the
entire
ecosystem.
That
has
been
writing
e2e
tests
based
on
that,
and
so
the
main
motivation
was.
J
Let's
move
that
the
first
motivation
was:
let's
move
that
thing
to
staging
so
that
things
can
depend
on
it
without
pulling
in
all
the
go
depths
from
kubernetes
kubernetes,
and
then
I
know
you
know.
Stick
testing
folks
here
had
concerns
around
moving
the
staging
moving,
the
current
e3
framework
to
somewhere
that's
more
publicly
consumable,
and
so
the
alternative
were
that
we
kind
of
landed
on
was
okay.
J
Let's
just
build
a
really
minimal
end-to-end
testing
framework
from
the
ground
up
that
is
going
to
address
cloud
providers,
csi
drivers
and
other
projects
that
need
some
sort
of
like
base
end-to-end
testing
framework,
but
may
not
necessarily
want
to
pull
in
the
giant
framework.
That's
in
kubernetes
careers
today,
and
so.
Hence
I'm
making
this
request
to
put
something
in
community
six.
A
Yeah,
the
only
other
thing
I'll
I
feel
like
came
up
in
our
discussion-
is
he
wanted
to
make
it
clear
that
this
was
experimental?
We,
you
were
you're
looking
for
some
place
to
kind
of
experiment
and
iterate
before
sort
of
declaring
okay.
This
is
the
way.
Let's
all
use
this.
J
J
I
think
that's
tbd,
I
think
yeah.
So
I
think
the
starting
point
is
like
getting
the
test.
Runners
up
and
running,
like
wiring,
keep
config
and
all
the
stuff,
and
then
we
probably
do
want
some
sort
of
like
pluggable
interface,
for
doing
like
pro
cloud
provider
operations
to
test
certain
behaviors
but
yeah.
I
think
that's
tbd
at
the
moment.
J
D
You
this
solves
that
all
the
cloud
providers
need
to
move
out,
but
they've
been
using
the
entry
testing,
but
we
don't
necessarily
like
while
we're
maybe
moving
the
providers
themselves
out
we're
not
necessarily
moving
the
test
binary
framework
out.
I
super
appreciate
taking
this
approach.
D
I
think,
if
nothing
else,
I'm
really
hopeful
that
this
proves
out
some
as
like
a
testing
ground
for
ideas,
even
if
it
winds
up
being
cloud
provider,
testing
specific,
we
may
know
what
we
want
to
do
to
provide
something
for
any
other
use
cases
based
on
how
this
goes
like
like
we
don't
necessarily
have
to
make
this
one
completely
general.
This
is
probably
my
only
comment
is
maybe
if
we
are
focused
on
the
cloud
providers,
we
might
even
want
to
tweak
the
name.
D
Yeah
I
I'm
hopeful,
it
won't
be,
but
I'm
also
given
the
you
know
already
big
ask
of
you-
know
trying
something
different
here.
I
wouldn't
want
to
block
on
getting
too
obsessed
with
being
general.
A
Okay,
well,
I
slapped
my
plus
one
on
there
as
a
chair.
I
want
to
be
super
respectful
of
everybody's
time
and
thank
you
all
for
staying
three
minutes
late.
Thank
you
all
for
your
time.
I
look
forward
to
seeing
you
all
on
november
10th,
happy
tuesday.