►
From YouTube: A Tour of CI on The Kubernetes Project
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Okay,
so
hello,
and
welcome
to
a
tour
of
ci
kubernetes
with
dan
and
rob,
and
what
we're
going
to
be
doing
today
is
I'm
just
talking
a
little
bit
about
the
work
that
ci
signal
does
to
report
flakes
back
to
the
community
of
developers
who
are
working
on
a
release.
So
the
ci
signal
team
is
a
team.
That's
part
of
sig
release
and
what
sig
release
do
is
work
to
shepherd
a
range
of
features
and
fixes
into
an
to
the
next
upcoming
release
of
kubernetes.
B
As
part
of
that
work,
there's
a
range
of
cross-cutting
concerns
and
activities
across
a
number
of
teams
that
take
place,
ranging
from
bug,
triage
and
and
sharpening
features
through
the
kep
process.
And
what
and
what
the
ci
signal
team
do.
Is
we
monitor
the
signal
coming
off
of
end-to-end
tests
as
part
of
ci
in
kubernetes,
so
what
we
are
primarily
focused
on
is
detects,
detecting
what
are
called
flaky
tests.
B
B
What
we're
going
to
do
is
going
to
look
at
the
tooling
that
we
use
to
to
monitor
the
signal
and
we're
going
to
just
tour
around
some
of
the
tools
on
that
we're
going
to
look
at
how
we
report
flakes
and
then
we're
going
to,
and
then
we're
going
to
I'm
going
to
hand
over
to
dan
who's
going
to
show
us
examples
of
fixing
the
flakes
and
working
with
the
community
to
get
flakes
eradicated
because
they
are
a
scourge
and
in
terms
of
how
ci
signal
works
as
a
team.
There
is
this.
B
There
is
an
a
really
awesome.
B
One
of
the
awesome
things
about
kubernetes
is
that
as
a
project,
the
kubernetes
project
works
hard
to
bring
on
new
contributors,
and
it's
one
of
the
things
that
really
attracted
me
to
the
kubernetes
project.
When
I
started
out
when
I
came
to
kubernetes
first,
I
I
hung
out
with
and
attended
a
lot
of
meetings
on
the
contributor
experience
project
and
it
was
through
them
that
I
learned
how
the
project
works,
how
the
project
is
structured
and
through
their
help.
B
I
ended
up
finding
out
that
my
area
of
interest
because
it's
naturally
ci
I
went
over
to
the
release
team
and
worked
on
ci
signal.
So
I
have
been
the
leader
of
the
release.
Just
the
the
team
lead
for
the
release
just
gone
by
dan.
Was
this:
the
team
lead
for
the
119
release
and
I
shadowed
dan
on
that.
So
how
shadowing
works?
Is
that
there's
an
application
process
whereby
you
can
apply
to
become
a
a
member
of
the
team
of
the
ci
signal
team
and
other
teams
as
well?
B
And
I
just
want
to
talk
about
that
a
little
bit
because,
as
a
team
lead,
your
first
task
is
to
go
through
the
applications
and
figure
out
who
you're
going
to
select
and
the
one
thing
I
just
want
to
note
for
people
who
do
apply
for
shadow
roles,
for
teams
like
ci
signal
is
that
the
competition
is
fierce.
So
that
is
to
say,
for
the
120
release.
I
think
I
had
an
excess
of
80
people
apply
to
be
shadows
on
the
team
and
then
from
a
management
point
of
view.
B
We
can
only
supervise
and
and
and
train
up
about,
a
team,
the
team
size
about
four
or
five
they're
their
bets.
So
if
you
applied-
and
you
didn't
get
onto
the
team,
don't
be
disheartened-
be
aware
that
there's
a
lot
of
competition
there
and
be
aware
that
it's
possible
to
apply
again,
there
will
be
more
releases.
B
The
other
thing
I
would
say
about
about
that
is
when
it
comes
to
application
for
for
jobs
in
the
community,
like
that,
it's
it's
important
to
remember
that
if
you
arrive
and
apply
having
done
some
of
the
work,
there's
nothing
stopping
you
from
from
looking
for
flakes
and
reporting
flakes,
and
if
you're
interested
in
becoming
a
shadow
on
the
team,
it
will
stand
you
in
good
stead
if
you
follow
the
instructions
that
we
provide
today
and
just
muck
in
and
and
and
and
do
the
work.
B
So
with
that
in
mind,
bear
in
mind
that
we
have
a
slack
channel
release
ci
signal
which
I've
linked
to
in
the
docs
there.
And
if
you
have
questions
about
this
talk
or
about
participating-
and
you
can
always
ask
those
questions
there
and
the
other
important
thing
to
do
is
to
join
the
sig
join
sig
release
and
I
have
links
about
the
sig
release
team
there
and
how
you
can
participate
in
meetings
there.
So
so
we're
going
to
get
into
test
grade.
I
have
a
small
trigger
warning.
B
One
of
the
things
that
we're
aware
of
in
the
kubernetes
project
is
that
we're
working
towards
replacing
harmful
language
and
replacing
a
neutral
language
wherever
possible,
and
I
just
like
to
give
a
shout
out
to
the
working
group-
that's
that's
being
set
up
and
called
the
naming
working
group
which
is
being
set
up
to
to
look
at
this
across
the
project,
so
so
in
on
github
and
in
our
git
workflow,
and
we
have
branch
names
that
bubble
up
into
ci
and
we're
aware
of
that
and
over
the
course
of
the
coming
weeks
and
months.
B
That's
something
that
we'll
be
working
on
to
make
changes
too.
So,
let's
just
get
into
it
and
let's
just
have
a
look.
The
first
tool
in
the
first
tool
that
we
have
in
the
chest
in
order
to
look
at
how
ci
works
on
kubernetes
and
this
tool
is
test
grid,
so
so
test
grid
from
a
release.
B
Point
of
view
presents
us
with
a
wide
range
of
jobs
that
run
end-to-end
tests
across
across
multiple
runtimes,
and
this
is
the
this:
is
our
our
go-to
tool
for
for
figuring
out
whether
or
not
a
job
is
flaky
and
whether
or
not
individual
tests
are
flaky.
So
this
is
a
summary
of
the
blocking
group
of
jobs,
and
these
are
jobs
that
are
considered
so
important
that
if
they
were
in
a
bad
state
that
would
block
a
release.
And
but
we
have
some
flakes
here,
that
we
can
look
at.
C
B
Conformance
ga
only
job
now
so
one
of
the
so
so
how
test
grid
works
is
that
is
it
basically
provides
a
job
view
of
all
of
the
tests
that
have
been
run
for
an
individual
job
and
in
order
to
make
sense
of
what
we're
seeing
here,
there's
a
lot
of
tricks
that
I
have
in
order
to
figure
out.
What's
going
on
here,
the
column
headers
here
the
column
headers
here
give
us
an
indication
of
the
periodicity
of
this
job.
So
this
this
happens.
This
job
happens
twice
a
day.
B
We
have
times
regis,
we
have
times
logged
in
pacific
standard
time
there.
If
you
want
to
flip
that
over
to
your
local
time,
I
I'm
coming
from
dublin
ireland,
so
that's
gmt.
I
can
get
a
sense
that
at
5
30
in
the
morning
for
me
and
5
30
in
the
afternoon
roughly
every
day,
each
cell
in
test
grid
and
represents
a
test
result
and
green
means.
A
test
passed
and
red
means
that
it
failed.
So
in
terms
of
managing
information
and
trying
to
get
a
handle
on
looking
at
these
failed
tests.
B
We
have
a
couple
of
tricks
here
that
we
can
and
a
couple
of
options
that
we
can
make
use
of,
and
typically
when
I'm
hunting
for
flaky
tests,
I
exclude
all
tests
that
have
not
failed,
and
that
then
gives
me
an
easier
use,
a
chunk
of
information
to
manage
the
other
thing
that
I
would
often
do
when
I'm
looking
at
this
test
here.
B
If
I
mouse
over
it,
the
test
name
is
highlighted
on
the
on
the
left
and
then
at
the
top.
I
can
see
the
date
at
which
the
test
run
the
time
it
run.
I
have
a
number
here
that
the
tooltip
says
build
number,
but
it's
really
the
proud
job
number
and
then
below
here.
This
is
the
commit
id
from
kubernetes
against
which
this
test
run.
B
So
the
first
thing
that
I
do
when
I'm
trying
to
figure
out
is
this
a
flake
is,
I
will
mouse
over
the
test
I'll
just
mess
over
to
the
right
and
I'll
note
that
the
git
commit
id
has
changed.
If
I
mouse
over
to
the
left,
the
git
commit
id
has
changed,
so
I'm
not
100
clear
it
that
this
is
a
flake,
but
what
I
want
to
do
is
I
want
to
go
back
in
time.
B
If
I
go
back
in
time,
I
can
look
I'll
just
get
rid
of
that.
If
I
go
back
in
time,
I
can
see
another
incidence
incident
of
this
test
failure
and
if
I
mouse
to
the
right
here
and
mouse
to
the
left,
you'll
note
that
I
have
the
same
commit
id
for
this
for
this
failure.
So
this
is
is
pretty
much
the
definition
of
a
flake
for
the
commit
id
that
ends
in
three
b:
nine:
zero
f.
Here
we
have
a
pass
here.
B
We
have
a
fail-
and
here
we're
back
to
passing
again
so
so
by
definition,
this
is
a
non
producing
a
non-determining
deterministic
test
result
and
for
any
of
the
cells.
I
can
click
through
to
get
to
this
view
of
the
job.
So
this
view
of
the
job
is
presented
by
a
tool
called
spyglass
and
spyglass
allows
us
to
see
how
the
job
was
run
in
prow
and
straight
away
there.
We
can
see
that
failing
test
and
if
I
just
expand
on
that,
we
can
say
that
this
is
the
test
error
message.
B
So
this
looks
like
a
straightforward
test:
failure
in
so
far
as
instinctively.
I
know
that
this
probably
isn't
a
runtime
issue,
given
the
name
of
the
test,
the
name
of
the
test
here.
I
might
actually
speak
to
this,
because
this
is
a
trap
for
new
contributors.
So,
on
the
left
hand
side
here,
sorry
just
go
back
on
the
left
hand
side
here
we
see
this
test
name
here
now.
B
If
we
want
to
drill
down
to
the
test,
we
could
attempt
to
do
a
search
for
this,
but
because
we
have
a,
we
have
a
tool
here
that
allows
us
to
search
the
repos
and
if
I,
if
I
pop
that
in
to
hound,
we'll
see
nothing
for
you
dog
now.
This
is
because
this
is
not
really
the
true
name
of
the
test.
B
So
this
is
the
this.
Is
the
test
that's
flaking
for
us.
We
can
see
there
and
dan
can
speak
to
more
to
this
and
we'll
see
more
of
this
when
dan
does
his
thing,
we
have
that
the
name
of
this
test
is
evicts
pods
with
min
tolerations,
and
it's
tagged
as
disruptive
and
how
the
rest
of
that
name
in
test
grid
is
formed,
is
by
traversing
down
through
the
hierarchy
of
the
end,
to
end
test
suite
that
we
use
to
do
end-to-end,
testing,
okay.
B
B
If
we
were
to
do
that
from
scratch,
we
would
go
to
issues
kubernetes
issues
and,
if
I
dropped
in,
if
I
dropped
in
this
test
name.
In
fact,
I
should
find
this
and
we
can
see
that
this
test
has
already
been
logged
by
scrap
codes
and
scrap
codes
worked
on,
that's
prashant.
He
worked
on
the
119.
B
Well,
he
worked
on
the
120
release,
formerly
as
a
shadow,
but
he
started
working
on.
He
actually
started
working
freelance
as
it
were
on
on
119,
and
I
showed
him
how
to
to
to
do
this
work.
But
this
is
an
example
of
a
logged
flag
test
where
we
go
through
and
we
describe
which
jobs
are
flaking
and
what
the
test
was
which
test
flaked
and
then
we
have
basically
all
of
the
evidence
to
back
up
to
back
up
our
flake
report.
B
So
I'll
just
show
you
how
that
works
in
terms
of
filling
that
filling
that
out
quickly
and
then
I
think
that
will
be
me
down
and
I
think
we
will
go
on
to
on
to
your
stuff.
B
Next,
when
we
log
in
issue
and
we've,
we
have
a
range
of
issues
that
we
can
report
and
we
have
the
failing
and
flaking
test,
and
if
we
just
have
a
quick
look
at
flaking
test,
we
see
that
we
have
a
github
template
that
allows
us
to
enter
in
all
of
the
information
to
describe
the
flake
that
we
have
found
now.
I
think
I've
linked
to
prashant's
logging
of
this
in
the
hackmd.
B
The
one
thing
that
I
want
to
quickly
point
out
when
filling
this,
because
that's
a
good
example
of
how
to
do
this.
The
one
thing
that
I
want
to
just
look
at
here
is
the
triage
tool.
So
in
test
grid,
we
can
look
at
jobs
from
the
point
of
view
of
all
of
the
tests
that
run
in
a
job.
Okay.
So,
but
there
are
other
views
that
are
interesting
to
look
at
when
we're
trying
to
figure
out
why
the
test
flaking.
B
So
we
have
a
tool
called
triage
and
what
triage
does
is
triage
looks
at
the
output
of
end-to-end
tests
from
the
point
of
view
of
from
the
point
of
view
of
errors
in
those
tests,
so
it'll
group
test
failures
by
error.
So
if
we
were
to
take
our
test
and
something
by
having
my
buffer,
if
we
were
to
take
this
particular
test
and
do
a
search.
B
Do
a
search
here
for
this
test
in
triage
we
would
we'd
get
useful
information.
That's
useful
for
test
maintainers
to
figure
out
what's
going
on
here.
So
one
of
the
key
things
to
figure
out
when
you're
trying
to
deflake
a
test
is
you
need
to
understand
when
and
where
the
the
test
is
flaking?
So
if
we
look
down
through-
and
I
might
make
this
a
little
bit
bigger.
B
We
can
see
that
there
were
17
failures
now
for
today
I'll
speak
to
that
in
a
second.
So
this
test
failure
occurred
in
this
way
across
a
multiple
number
of
jobs
and
that's
useful
information
for
somebody
trying
to
triage
the
test.
And
let
me
just
see,
let's
scroll
down
a
little
bit
further,
because
if
I
recall
in
preparing
the
talk.
B
I
think
I
might
have
seen
this
test
failed,
yeah
on
a
windows
job,
so
so
that
means
that
it
failed
when
running
unkind.
It
failed
when
it
when
it
was
run
in
the
context
of
a
windows
runtime.
So
then
that
that
I
think
that's
useful
information
done,
isn't
it
for
for
for
people
troubleshooting
the
test.
So
I
think
that's
pretty
much.
I
think
that's
pretty
much
my
walkthrough
dan,
I'm
just
wondering
it.
Are
you
feeling
any
questions
there
or
feedback
and
chat?
I
can't
see
it.
C
So
we
had
one
question
around
if
we
had
a
bot
to
report
flakes,
because
you
demonstrated
right
when
we
can
tell
for
sure
that
there's
a
flake-
and
I
mentioned
in
the
channel
the
general
channel
on
the
discord
server
for
folks
that
are
watching
on
youtube,
that
the
feta
bought
will
go
ahead
and
record
flakes
on
pre-submits.
But
we
definitely
could
improve
that
on
the
periodics
right.
Because.
B
Yeah,
the
the
over
the
course
of
the
120
release.
I
attempted
to
do
a
crazy,
join
between
the
very,
very
structured
data
that
we
get
out
of
test
grid
and
join
against
the
manually
logged
flakes
in
github,
and
although
everything
is
technically
possible,
the
the
challenge
there
is
is
that
different
people
log
flakes
in
slightly
different
ways.
So
that's
difficult
to
parse.
B
One
of
the
things
I
will
say
that
is
awesome
about
about
ci
on
the
kubernetes
project
is
that
the
data
from
from
jobs
that
are
run
is
collected
and
put
into
a
bigquery
database,
and
so
so
our
ci
process
is
database,
backed
which
then
means
that
we
have
tools
like
triage
that
allows
us
to
slice
and
dice
the
runtime
data
in
different
ways
which
helps
us
and
get
to
the
bottom
of
why
things
are
the
way
they
are
and
at
the
moment,
there's
a
problem
with
getting
data
across
to
bigquery
and
people.
B
People
are
working
on
that,
and
so
you
can
see
there
that
we
have
data
up
to
december
1st,
but
that
that's
something
that's
been
actively
looked
at
and
that's
something
that
that
that
we're
working
to
fix
and,
broadly
speaking,
the
ci
signal
team.
B
Although
it's
part
of
the
sick
release
team,
we
work
closely
with
this
with
sig
testing,
because
because
it's
their
technology
that
we're
using
to
deliver
ci
and
we
work
with
testing
ops
as
well
and
at
and
broadly
speaking,
the
infrastructure
which
runs
ci
jobs
is
very
reliable
and
jobs
are
run
reliably
and
and
anytime.
We
hit
up
against
infrastructural
issues.
Now
it
could
be
to
do
with
configuration
of
jobs
rather
than
jobs
not
being
run,
but
I
suppose
we
I
I
can
hand
that
over
to
you
down
now.
C
Sure
sounds
great,
and,
and
that
was
a
great
overview
of
the
tooling
there
robin
and
one
of
the
you
know
things
we
see
when
folks
come
to
the
project,
and
you
know
from
our
own
experience
as
well.
It
can
be
a
little
bit
daunting
to
understand.
You
know
like
what
url
matches,
what
tool
and
what
each
tool
is
used
for
and
there's
some
overlapping
responsibilities.
C
We
also
have
you
know,
speaking
for
myself
and
probably
the
rest
of
cs
signal.
Members
have
an
issue
of
sometimes
calling
things
general
names
like
you'll,
hear
a
lot
of
things
referred
to
as
prow.
For
instance,
I've
hardly
ever
hear
anyone
say
spyglass,
and
so
you
know
it
can
be
a
little
tough
to
parse
that
out.
So
I
just
want
to
you
know,
follow
up
with
what
you
were
talking
about
by
saying.
Please
feel
free
to
ask
questions.
There's
there's
no
such
thing
as
a
dumb
question.
C
There's
probably
tools
that
we
don't
even
know
about,
even
though
we've
been
doing
this,
for
you
know
over
a
year
now,
and
so
definitely
feel
free
to
ask
about
that.
But
yeah
thanks
for
that
overview,
rob
I'm
gonna.
Go
ahead
and
start
sharing
my
screen
here:
okay,
I'm
getting
a
hostile
disabled
screen
sharing,
so
I
might
need.
C
B
C
Awesome
thanks
bob
all
right,
so
I'll
go
ahead
and
share
my
screen
here
and
I
always
have
trouble
with
this
zoom
over
hang.
C
C
All
right,
so
we
are
set
to
go
here
and
so
rob's,
giving
us
a
nice
overview
of
all
the
tooling
and
how
to
interact
with
it
and
that
sort
of
thing-
and
you
know,
there's,
there's
miles
more
layers
to
it
and
a
lot
of
that
you
experience
when
you
actually
try
to
you
know,
track
down
what
what
issue
you're
seeing.
C
So
in
my
personal
experience,
as
rob
mentioned,
I
kind
of
came
to
kubernetes
through
sig
release
and
started
out
in
ci
signal
and
did
a
lot
of
the
issue
logging
and
that
sort
of
thing
and
then
the
more
I
did
it.
The
more
I
became
familiar
with
the
fixes
that
were
coming
in
for
different
cigs
and
became
more
involved
with
that,
and
so
you'll
start
to
see
that,
especially
if
you
spend
a
lot
of
time
on
ci
signal,
you
know
you'll
start
actually
starting
to
fix
some
of
the
bugs
yourself.
C
So
we're
going
to
walk
through
some
of
that
another
advantage
of
walking
through
it.
Is
you
see
how
you
know
your
workflow
through
these
two
different
tools?
We've
seen
goes
so
I'm
going
to
go
with
a
few
different
kind
of
like
cherry
picked
issues
that
we've
had
over
the
last
few
months
that
demonstrate
different
scenarios
you'll
run
into
when
trying
to
troubleshoot.
C
So
this
should
also
be
useful
for
folks
who
are
not
interested
in
ci
signal,
but
you
know
work
with
a
sig
or
doing
feature
work
and
and
getting
to
fix
something
that
they
introduced.
C
So
rob
kind
of
called
out
already
that
flakes
and
failures
are
kind
of
the
two
main
categories
of
of.
I
guess
failures
that
we're
going
to
see
here
with
tess
and
so
there's
three
different
main
categories
you
can
run
into
when
something
is
failing
right:
it's
either
a
bug
in
the
code,
so
the
actual
code
base,
the
actual
kubernetes
code
base,
there's
a
bug
and
an
implementation
there.
So
that's
what
you
generally
think
of
as
the
purpose
of
testing.
So
that
could
be
one
situation.
C
Another
could
be
a
bug
in
the
test,
which
means
you
know
we're
testing
the
wrong
thing,
so
that
could
be
another
and
we'll
walk
through
one
of
those
and
then
the
last
one
which
can
be
the
hardest
to
track
down,
especially
if
you're
not
involved
with
sig
testing
or
sig
release
can
be
an
issue
with
infrastructure
or
tooling,
and
there's
a
lot
of
different
things
that
can
go
wrong
there.
C
So
I
have
a
few
different
examples
that
we'll
walk
through,
but
without
any
further
ado.
Let's
go
ahead
and
start
with
a
bug
in
the
code
which
is
kind
of
the
most
straightforward
example.
So
this
is
an
issue
I
opened
on
november
11th.
This
was
an
informing
job
that
we
have
on
gce,
ubuntu
master
default
and
you'll
see
this
issue,
format
that
rob
already
detailed,
and
so
I
basically
just
followed
the
general
thing.
I
got
a
ping
that
this
test
was
failing.
C
I
went
ahead
and
looked
at
the
board,
which
this
board
will
be
out
of
date
at
this
point,
but
went
to
this
job
and
saw
that
it
was
consistently
failing
at
this
point,
which
is
why
it'd
be
a
failing
test
rather
than
a
flaking
one.
So
we
had
red
all
across
the
board
here,
so
I
opened
it
up,
gave
it
the
failing
test
label.
This
is
informing
and
we
wanted
to
get
this
fixed
up
for
1.20.
C
As
rob
said,
informing
jobs
do
not
have
to
be
passing
for
us
to
go
ahead
and
release.
That
being
said,
it's
definitely
concerning
when
something
is
consistently
failing,
no
matter
where
it
is
so.
Once
again,
I
have
the
output
here
of
what
we
are
getting
from
the
job
and
the
the
next
thing
to
note
is
the
sig
right.
So
how
was
I
able
to
determine
what
sig
this
went
with?
C
So
primarily
it
had
to
do
with
a
docker
exec
liveness
pro
probe,
so
obviously,
docker
is
running
as
the
cri
implementation
on
a
node,
and
that
would
you
know,
cater
to
something
in
signal,
so
I
went
ahead
and
put
that
signal
label
on
it.
I
also
was
able
to
determine
a
pr
that
introduced
this.
This
failure.
C
So
once
again,
as
rob
was
already
detailing,
when
you
go
through
here,
we
can
look
at
both
the
the
kubernetes
commit
hash,
as
well
as
the
test
infra
commit
hash
which
I'll
hop
over
to
the
test
infrarepo
in
a
minute,
because
that's
where
a
lot
of
this
tooling
lives,
as
well
as
the
configuration
for
all
these
dashboards.
C
So
when
I
actually
looked
at
the
job
that
I
opened
this
issue
for
there
was
a
distinct
change
in
commit
hash
which
went
from
green
to
red,
and
so
I
was
able
to
tell
you
know
it
likely
was
introduced
by
whatever
happened
between
these
two
commit
hashes
and
if
you're
not
familiar,
github
actually
has
a
pretty
useful
way
to
that's,
not
what
I'm
going
to
do
to
compare,
commit
hashes
and
there's
some
shortcuts
to
be
able
to
get
this
open.
C
But
here's
just
an
example.
I
have
in
my
my
search
history
here,
but
you
can
actually
just
supply
the
first
commit
and
the
second
commit
separated
by
an
ellipses
and
you'll,
see
all
the
commits
that
happen
between
those
two.
C
So
this
is
actually
for
a
different
failure
that
I
was
tracking
down
that
have
cached
here
so
but
I'll
use
this
as
an
example,
you
can
see
here
that
all
these
commits
are
part
of
a
single
pr,
so
likely
whatever
started
failing.
If
it
went
from
green
to
red
between
those
commit
hashes
and
there's
only
one
pr,
we
can
go
ahead
and
determine
that
that
was
probably
the
pr
that
introduced
whatever
caused
the
failure.
C
C
So
looking
at
this
pr,
we
can
once
again
see
its
cubelet
work,
so
it
is
in
sig
node.
You
could
also
see
that
by
the
labels
on
it
and
then
andrew,
who
makes
lots
of
different
contributions
to
the
kubernetes
project
and
is
a
a
very
important
contributor
had
introduced
some
new
functionality
to
respect
exec,
probe
timeouts,
and
if
we
go
back
over
here,
you'll
see,
I
went
ahead
and
tagged
him
on
here
and
kind
of
gave
some
context
around
the
issue.
C
The
first
thing
that
I
wanted
to
realize
is
if
this
caused
a
consistent
failure.
How
is
this
merged
right?
So
when
we
look
at
the
the
release
dashboards
here,
we're
looking
at
periodic
jobs.
So,
as
rob
said,
that
means
right
that
we're
running
them
on
a
consistent
cadence
that
you
can
see
here
at
the
top,
the
difference
between
the
hours
that
they're
running
and
the
time
we
also
have
pre-submits.
C
So
so
the
the
ones
running
on
a
cadence
are
periodics,
the
ones
that
run
on
a
pr
before
it
goes
in
are
called
pre-submits,
and
so,
let's
just
go
to
a
recently
opened
pr.
C
We
can
see
who
our
winner
is.
Let's
look
at
this
at
cd
version.
One
here
so
you'll
see
that
there
are
lots
of
jobs
here
that
are
running
against
the
pr
before
we
merge
and
those
all
have
to
pass
if
they
are
merge
blocking,
and
so
how
did
we
pass
all
those
tests
right
and
get
something
in
the
thing
caused
a
periodic
job
that
is
fairly
critical,
with
it
being
on
the
informing
board
to
fail?
C
So
I'm
going
to
go
over
to
spyglass
here
as
as
rob
showed
and
there's
a
number
of
different
things
up
here
that
are
helpful.
Links
to
artifacts
from
different
job
runs,
so
you
obviously
have
a
link
back
to
test
grid.
Artifacts
is
going
to
show
you
all
of
the
artifacts
from
that
test
run.
So
you
can
see
the
proud
job
configuration
the
start
and
stop
things,
and
then
you
can
also
see
the
different
output
from
things
like
the
node
logs
and
and
that
sort
of
thing.
C
So
once
again,
looking
back
at
this,
the
prow
job
yaml
is
what
is
going
to
tell
us
how
this
job
is
configured,
and
I
believe
here
we
search
for
it.
The
important
thing
on
this
job
was
actually
that
we
are
running
with
cube
container
runtime
docker,
and
if
we
flip
back
over
here,
this
obviously
had
to
do
with
the
docker
exec
liveness
probe.
C
C
Node
end
to
end,
so
we
have
pull
kubernetes
node
ended
in
which
sounds
pretty
similar
to
well,
not
that
similar
to
gc
master
default.
But
if
we
look
at
the
end-to-end
test
here,
we
can
see
that
it's
running
similar
ones
and
if
we
hop
back
over
to
the
node
end-to-end
though,
and
take
a
look
at
the
pro
job,
yaml
you'll
see
that
it's
not
specifying
docker
as
a
container
runtime,
and
I
won't
go
into
exactly
why
we're
able
to
determine
that
a
different
container
runtime
is
being
used.
C
But
basically
this
is
using
container
d
under
the
hood.
So
the
fix
here
was
that
we
had
run
tests.
We
had
run
tests
against
one
container
runtime,
but
this
actually
was
a
change
in
a
a
different
container,
runtime
docker
versus
container
d,
which
we
can
talk
about
docker
shim
and
some
of
the
deprecation
around
that
and
what
the
difference
between
docker
and
container
dr
but
I'll
leave
that
to
a
different
talk.
But
essentially
right.
C
My
initial
inclination
was
that
we
were
using
incorrect
version
markers
but
andrew
did
a
little
more
digging
and
we
were
able
to
see
that
it
was
because
of
the
container
runtime
you'll,
also
notice
that
you
know
this
was
running
with
a
different
operating
system
as
it's
obviously
running
on
ubuntu.
So
that
was
something
to
take
in
mind.
The
the
pre-submit
was
running
with
the
the
google
container
os,
and
so
that
was
another
thing
to
investigate
anyway.
C
I
believe
we
talk
a
little
bit
about
running
this
test
in
the
specific
job
yep
so
you'll,
see
here
and
we'll
talk
about
what
this
means
in
a
little
bit.
But
during
this
pr,
andrew
temporarily
added
this
node
conformance
tag
onto
the
test
which
made
it
get
exercised
by
the
test
that
was
running,
and
so
we
got
an
idea
of
whether
it
was
passing
or
not
so
that
allowed
us
to
using
the
pre-submit
be
able
to
determine
if
this
was
going
to
actually
fix
the
problem.
C
So
that's
kind
of
a
walk
through
of
a
typical
or
somewhat
typical
bug
in
the
code.
So
new
functionality
was
introduced,
it
broke
something
it
passed
the
pre-submits
and
we
caught
it
in
the
periodics
another
follow-up
to
an
action
like
that
is
determining.
Maybe
we
should
be
running
this
test
on
the
pre-submits
right.
If
it's
going
to
cause
breaking
changes,
maybe
we
need
to
be
exercising
that
code
path.
C
All
right
next
thing
that
we
could
look
at
is
a
bug
in
the
test
right,
so
when
for
any
of
these
jobs,
essentially
what's
happening
for
most
of
them
and
I'm
excluding
build
jobs,
and
things
like
that
ones
that
are
actually
running
things
from
our
end-to-end
test.
Suite
what's
happening
is
they're,
not
rebuilding,
kubernetes
and
running
it.
C
Each
time
right,
they're
getting
the
latest
version
of
the
kubernetes
release,
so
the
latest
build
of
it
depending
on
the
job,
and
it's
going
to
download
that
and
then
it's
going
to
clone
the
repo
for
the
tests
that
are
in
the
kk
repo,
also
just
to
make
sure
that
we're
defining
everything.
Well
kk
is
a
common
acronym
to
refer
to
kubernetes
kubernetes
repo,
as
there's
many
repos
under
the
kubernetes
org.
C
So
if
we
look
and
test
here
primarily
we're
thinking
about
these
end-to-end
tests
here,
so
what
happens?
Is
these
jobs
will
go
ahead
and
download
kubernetes
they'll
run
it,
and
then
they
will
exercise
the
tests
in
the
test
subdirectory
here
against
the
version
that
they've
downloaded
if
it's
something
like
sig
release,
1.20
blocking
you'll
see-
and
let's
look
at
one
here-
real
quick
just
to
demonstrate
this.
C
C
Well,
we
won't
get
into
fast
builds
today,
but
essentially
they're
building
for
a
single
architecture
and
operating
system,
rather
than
the
general
build
which
builds
for
all
so
yeah,
based
on
the
you
know,
the
version
that
we're
trying
to
test
we'll
download
a
different
version
of
kubernetes
and
clone
that
branches
tests
and
run
them
against
it
all
right
so
back
to
this
example
of
a
test
failing
because
a
bug
in
the
test.
C
This
was
one
that
once
again
flipped
from
green
to
red,
but
it
was
just
flaky
right,
so
we
introduced
a
new
test
and
it
didn't
actually
just
cause
things
to
consistently
fail,
but
we
could
tell
from
the
scroll
back-
and
I
just
have
this
from
context,
but
we
can
probably
find
one
here
as
well.
C
One
of
these
might
be
a
good
example,
so
this
one
is
is
having
a
pretty
consistent,
flake
rate,
but
if
you
had
something
where
it
was,
you
know
green
for
30
columns
or
something
like
that,
and
then
it
suddenly
started
turning
red
every
other
column.
That
would
be
a
good
indication
of
when
the
code
was
introduced
that
caused
the
flake.
C
So
in
this
case
I
wasn't
able
to
pin
it
down
to
a
specific
pr,
so
you'll
see
that
I
have
my
emphasized
might
be
related
to
here
this
pr,
so
I
was
able
to
determine
when
the
flake
rate
increase,
increased
and
guess
on
the
pr
it
looks
like
I
was,
let's
see
if
that
was
the
oh.
No,
that
was
an
issue
I
actually
tagged.
So
it
looks
like
that.
Morgan
here
was
actually
able
just
to
watch
for
this,
and
this
is
a
example
of
the
benefits
of
tagging.
C
So
morgan
was
able
to
follow
up
and
say:
oh,
I
know
exactly
what's
happening
here,
because
morgan
had
context
that
I
didn't
have,
which
is
really
useful
and
said.
I
can
do
the
fix
here,
went
ahead
and
assigned
themselves
to
it
and
provided
the
fix
and
tagged
me
on
it,
and
this
is
a
really
good
example
of
what
could
cause
flakes
based
on
how
the
test
is
designed.
C
You'll
see
what
morgan's
doing
is
relaxing
the
matcher
here
so
for
this
specific
test
it
has
to
do
with
the
resource
metrics
api,
so
we're
looking
at
the
metrics
that
are
produced
and
initially
when
this
test
was
introduced,
it
was
match
all
elements.
So
basically
saying
I
expect
the
metrics
that
we
see
to
look
exactly
like
what
I'm
specifying
here,
so
no
more,
no
less
and
for
the
ones
that
we
have
to
look
exactly
like
these
ones.
I've
provided
in
reality
that
wasn't
what
we
needed
to
test
here.
C
We
just
want
to
see
that
these
two
that
we
were
interested
in
were
present
and
if
we
had
you
know
a
million
other
ones.
That
was
all
right.
So
you
could
see
how
this
could
produce
flakes
here
right
because
sometimes
there
may
be
other
metrics.
Sometimes
there
may
be
not
depending
on
what
else
is
running
in
the
cluster
and
exposing
this.
So
we
can
just
change
this
to
match
elements
and
ignore
the
extras
here
and
that
turned
us
back
green.
So
that
was
a
very
quick
fix
here
and
you'll
see.
C
And
one
thing
I
wanted
to
point
out
here
which
rob
actually
brought
up
when
we
were
going
through
kind
of
the
preparation
for
this
is
you'll,
see
this
g-struct
here
package
and
let
me
go
up
to
the
top
and
see
where
that's
coming
from
so
you'll
see.
This
is
coming
from
gomega
here
and
essentially
what
it's
doing
is
allow
us
to
match
the
contents
of
strokes
here.
But
I
wanted
to
point
out
that
we
use
some
external
frameworks
alongside
the
kind
of
end-to-end
framework
in
in
the
end-to-end
tests.
C
So
ginkgo
is
used
across
a
lot
of
our
tests
and
you'll
see
it
used
to
do
things
like
set
the
context
or
run
things
before
each
and
the
particularly
important
one
that
that
rob
was
pointing
out
earlier
when
searching
with
hound.
Is
this
ginkgo
it
which
is
prepending
information
onto
the
front
of
this
kind
of
description
here
so
you'll
frequently
see
if
we're
looking
at?
Let
me
go
back
to
one
of
these
and
see
if
this
was
this,
wasn't
a
failure.
Let
me
grab
a
failure
and
see
if
this
has
it
so
yeah.
C
This
is
a
good
example
here
it
probably
pre-pended
this
kubernetes
end-to-end
suite
may
have
done.
Sig
storage,
probably
the
csi
mock
volume,
csi
volume,
expansion
and
then
we
probably
just
saw-
should
expand
volume
by
restarting
part
pod
if
attach
on
node
expansion,
on
which
looks
kind
of
like
what
we're
seeing
here
with
should
report
resource
usage
to
the
through
the
resource
metrics
api.
C
So
when
you
get
more
familiar
with
interacting
with
some
of
these
tests,
when
you
see
a
failure,
it's
a
little
bit
easier
to
go
ahead
and
pull
out
what
to
search
for,
but
it
can
definitely
still
be
challenging.
Also,
the
tags
here,
like
sig
storage,
can
help
you
identify.
C
You
know
what
part
of
the
end-to-end
testing
framework
to
look
in
rob.
Did
you
have
something
you
are
one
to
bring
up.
B
Is
it
it's
almost
like
the
it's
a
flattened
hierarchy
of
the
ginkgo
traversal
down
to
the
test,
so
so
what
ginkgo
is
trying
to
do
is
to
trying
to
describe
in
english
what
is
happening,
how,
where
it's
happening,
how
it's
happening
so
so
the
so
so
that's
you
typically
find
that
an
end
to
end
test
frameworks
where
you,
where
you
almost
want
to
be
able
to
just
have
an
english
statement
of
what
the
expected
behavior
of
the
test
is.
So
I
think
the
the
ultimate.
B
I
think
the
ultimate
next
step
up
would
be,
have
cucumber
and
just
be
able
to
have
end
users
read
what
is
happening
under
the
hud
and
just
extract
out
those
english
statements,
and
so
that
people
who
aren't
into
the
test
but
are
interested
in
the
expected
behavior
can
read
those
off
and
but
that's
that's
for
another
day.
That's
another
day's
work
we've
plenty
to
be
getting
on
with.
I
think
it
stands
now.
C
Absolutely
and
as
rob
says,
there's
lots
of
areas
for
improvement.
One
of
the
things
I
want
to
point
out
here
since
we're
talking
about
finding
the
actual
test
failure.
Obviously
a
good
place
to
look
here.
Is
you
know
the
code
where
the
failure
happened
right,
so
we
have
line
numbers
here
because
they
are
go
tests
that
are
being
run.
C
Sometimes
you
have
to
follow
this
a
little
bit
to
you
know,
figure
out
where
it
actually
happened,
because
the
way
that
things
bubble
up
with
ginkgo
like
if
you're
doing
a
before
each
or
something
like
that
the
failure
may
be
on
you-
know,
line
66
because
of
this
failure
down
here.
So
that's
typical
go
testing
stuff
and
once
again,
if
you
have
questions
on
any
of
these
practices,
definitely
feel
free
to
drop
them
in
the
sig
release,
channel
or
ci
signal.
C
Is
these
test
args
that
we
passed
so
you'll
see
as
we've
already
shown
in
a
number
of
different
test
failures
that
we've
looked
at?
We
have
these
indicators
and
the
brackets
here
so
basically
tags
on
the
different
tests,
and
that
allows
us
to
choose
different
tests
that
we
want
to
run
based
on
the
tags
that
they
have
right.
So
here
in
this
end-to-end
gce
one.
We
want
to
skip
all
tags
that
have
are
all
tests
that
have
tags
that
are
slow,
serial,
disruptive,
flaky
or
a
feature
test.
C
You
could
also
do
things
like
ginkgo
focus,
which
says
only
run
tests
with
these
tags,
and
then
you
can
combine,
skip
and
focus
right
to
get
a
specific
subset.
So
a
good
example
of
that
is
one
that
rob
was
actually
looking
at
earlier.
I
think
conformance
ga
only.
C
C
C
If
we
actually
went
over
to
kk
here-
and
let
me
grab
that
again-
I'm
going
to
pain,
everyone
and
use
the
github
search,
but
should
be
pretty
easy
here.
So
you'll
see
here
that
there's
only
really
one
area
where
we're
running
or
potentially,
two
areas
where
we're
running
gpu
device
plug-ins
here
so
it'd,
be
really
easy
to
troubleshoot
an
issue
with
this
right
because
it
either
happened
here
here
likely.
C
So
there's
only
really
those
tests
that
we're
running
and
we'll
actually
talk
about
this
test
in
a
minute
as
well.
So
when
I
pull
up
this
pro
job
yaml,
that's
configuring
how
everything
is
run.
Where
does
that
come
from
right,
well,
test
infra
is
probably
where
you're
going
to
spend
most
your
time
if
you're
really
interested.
In
this
whole
ci
signal
and
testing.
C
Realm
and
you'll
see
some
of
the
different
names
that
we
already
mentioned
here,
some
of
them
that
we
didn't
yet
a
lot
of
these
are
used
to
either
build
images
that
are
used
to
basically
bootstrap
the
tests
or
their
frameworks
like
cubetest,
which
allows
you
to
you
know,
spin
up
the
kubernetes
cluster
in
a
consistent
way
and
test
against
it.
C
You
also
see
prow
here
some
test
grid
stuff,
although
test
grid
is
a
separate
repo
and
there's
lots
of
different
things
in
here
and
one
of
the
things,
and,
I
think,
there's
an
issue
open
to
it.
But
some
of
the
tools
are
outlined
here,
but
it'd
be
really
great
if
we
could
get
a
diagram
right
that
showed
how
all
these
tools
interact
with
each
other
and
get
an
overview
of
this,
I
know.
There's
something
rob
is
passionate
about.
B
This
is
this
is
what
well
you
want
the
diagram
when
you're
starting
out,
and
then
you
see
as
you
learn
as
you
go.
Oh
yeah,
it's
okay,
they'll
just
talk
to
each
other,
one
of
the
we're
having
a
lot
of
chatter
in
the
in
chat
about
about
data
retention
and
how
data
how
ci
is
is
data
backed
so
carlos
is
asking
some
good
questions
about
how
far
do
test
grid
results
go
back.
B
The
broadly
speaking,
there's
two
aspects
to
data
storage
from
running
ci
jobs
in
the
kubernetes
project,
and
the
metadata
is,
I
think,
loaded
up
by
kettles,
which
is
what
dan
is
visiting
there.
Now
that's
kubernetes
extract
test,
transform.
That's
the
etl
yeah,
it's
it's
a
it's
a!
I
don't
have
great
stats
on
the
amount
of
data
going
into
bigquery,
but
I
just
know
instinctively:
it's
a
it.
You
know
they're
using
bigquery
for
a
reason.
It's
a
lot
of
data,
so
metadata
pertaining
to
jobs
should
be
funneled
into
into
bigquery.
B
When
it
comes
to
job
artifacts
and
the
limiting
resource,
there
would
be
buckets
yes,
so
yeah
it'd
be
it'd,
be
data
storage
book
it's
there
so
and,
like
donna
saying
all
of
these
tools
are,
for
the
most
part,
they're
pretty
much
in
test
grid
test
grid
is,
I
suppose
it
is
a
monorepo
with
a
lot
of
tooling
in
it
and
and
when,
when
I
suppose,
the
most
famous
one
or
the
the
application
that
I
know
that
has
moved
out
recently
has
been
test
grid.
B
So
the
back
end
of
test
grade
has
been
open
sourced
and
the
front
end
is
still
is
not
yet
open
sourced.
There's
a
lot.
C
B
Little
things
that
I'd
like
to
do
for
for
the
ci
signal
use
cases
there,
but
but
when
it
comes
to
getting
expert
and
deep
knowledge,
if
you,
if
you,
if
you
log
issues
on
the
on
the
new
test
grid
repo
and
if
you
make
suggestions
for
front-end
changes
that
you
would
like
to
have
happen
and
they,
if
they
are
implementable
and
the
team
who
work
on
test
grid,
will
make
those
changes
for
you
and
they're
awesome
to
work
with.
B
I'm
in,
I
speak
to
michelle
shepardson
who
works
on
that
project
once
every
two
or
three
months,
and
and
we
talk
about
what
we'd
like
to
have
happen
on
test
grid
and
she's
awesome.
She
does
great
work
on
that.
C
Awesome
thanks
for
pointing
out
rob
and-
and
that's
a
great
point-
you
know
a
lot
of
these
issues
that
I'm
going
to
here.
I'm
not
really
able
to
show
you
exactly
what
was
happening
at
that
point
in
time,
and
you
know
if
we
have
that
data
still
available,
we
could
present
it,
but
I
also
don't
have
all
the
context
as
well
on
what
the
data
retention
story
is
there.
C
And
if
we
look
at
jobs
and
rob,
you
may
have
already
said
this,
but
I
wanted
to
point
out
once
again
these
different
tabs
here.
Well,
first
of
all,
the
tabs
at
the
top
right
are
dashboards.
They're,
test
grid
dashboards,
each
of
the
tabs
listed
under
a
dashboard
are
jobs,
and
then
each
of
the
line
items
well
build
master
fast
was
not
a
great
one.
To
pick
for
that,
here
we
go
each
of
the
line.
C
Items
here
is
a
test
that
is
being
run,
so
I
it
may
just
be
me,
but
when
I
first
started
out
with
ci
signal,
it
was
really
difficult
to
me
because
sometimes
we'll
say
you
know
like
this
test
is
failing
and
we
really
mean
a
job.
C
Or
you
know
this
job
is
failing
and
on
all
of
our
issues,
which
is
something
that
I've
kind
of
pushed
on
in
the
past,
we'll
say
failing
test,
and
sometimes
this
is
an
example
where
I
actually
have
the
test
listed
here,
but
sometimes
we'll
have
a
job
name
if
we
aren't
able
to
pin
it
down
to
a
single
test
or
it's
a
lot
of
test
failing
and
we'll
say,
failing
test
this
job
name
which
yeah
it
kind
of
is
true,
but
it's
not
as
exact
as
I'd
like
it
to
be
so
anyway,
something
to
keep
in
mind.
C
It
goes
dashboards,
jobs
tests
and,
as
I
mentioned
all
of
these
job
configs
and
then
the
association
of
the
jobs
to
the
dashboards
that
they're
presented
on
in
test
grid
all
lives
in
test
infra
and
so
we'll
primarily
be
looking
at
the
kubernetes
one.
Here,
where
you'll
see
all
the
different
sig
ones
specifically
for
sig
release,
we
can
look
at
their
release
branch
jobs
which
either
these
are
actually
auto-generated.
C
If
there
is
one,
if
it's
something
like
general
end-to-end
tests,
there
may
not
be
a
a
single
sig
that
owns
it.
Go
ahead.
B
Yeah,
so
so
one
of
the
things
that
might
be
worth
pointing
out
here
as
we
look
at
this-
is
that,
as
as
part
of
as
part
of
the
efforts
to
corral
and
manage
and
tend
to
tend
to
ci
jobs
on
the
project,
we
have.
We
have
organized
sort
of
like
a
program
of
work
in
order
to
maintain
these
jobs,
and
so
so
there
was
a
release
where
we
were
getting
a
lot.
A
lot
of
noise
in
our
signal
pertaining
to
jobs,
not
being
not
stating
their
resources
properly.
B
So
as
we
look
at
their
line,
36
down
to
42
and
this
job
is-
is
requesting
limits
in
cpu
and
memory
and
requests
for
cpu
memory
and
a
lot
of
jobs
didn't
have
have
those
resources
specified.
So
as
a
result,
it
made
the
schedule
difficult.
It
made
life
difficult
for
the
scheduler
to
schedule
those
jobs
on
the
infrastructure.
B
So
from
the
point
of
view
of
finding
work
to
do
I
I'll
add
the
project
board
to
the
hackmd,
but
the
the
hackin,
but
the
sig
testing
team
and
aaron
krikenberger
did
great
work
with
laurie
apple,
to
set
out
the
program
of
work
whereby
that
sig
testing
team
and
testing
ops
could
get
help
from
the
community
in
order
to
improve
ci
configuration
on
things
like
that.
C
Yeah,
absolutely,
and-
and
since
you
bring
that
up,
a
good
thing
to
remember
with
all
of
these
different
tests
is
that
they
are
running
in
a
kubernetes
cluster
themselves,
so
they
get
scheduled
to
a
node.
So
they're,
essentially
you
know
just
how
pods
get
scheduled
in
kubernetes.
These
jobs
are
running
in
pods,
on
a
gke
cluster
actually,
and
here
you'll
see
that
we
have
the
resources
for
them,
so
that
is
controls
how
they
get
scheduled
to
the
nodes
right.
C
So,
for
instance,
maybe
if
I
find
one
of
the
kind
jobs
here,
they
have
pretty
heavy
cpu
requests.
C
In
fact,
this
actually
means
that
they
have
to
run
on
a
node
by
themselves,
based
on
the
node
size
that
we're
running
and
if
there
isn't
a
node
available
in
the
cluster
for
a
job
to
run,
then
you'll
see
it
gets
failed
schedule
after
time,
out
of
something
like
five
minutes
and
that's
one
of
the
examples
I
have
here
in
a
moment,
but
that's
the
general
idea,
so
yeah
all
of
the
the
jobs
live
in
the
test.
C
Infrarepo-
and
another
thing
I
want
to
mention
is
the
same:
job
may
be
on
multiple
dashboards,
so,
for
instance,
this
one
on
1.17
blocking
it's
a
conformance
job.
So
if
you
want
to
go
to
here
and
look
at
conformance
all
we
would
see
you
know
all
these
different
version,
conformance
jobs
here
also
on
this
dashboard
right.
C
So
a
jobbing
on
a
dashboard
does
not
mean
it's
only
there,
and
this
allows
for
different
sigs
to
you
know,
prioritize
different
jobs
that
they
want
to
look
at,
and
there
may
also
be
you
know,
other
sigs
or
other
groups
that
are
interested
in
in
their
status
as
well
all
right,
so
many
things
it's
hard
to
it's
hard
to
get
them
all
in
there,
but
we're
going
to
try
also.
I
know
we
are
kind
of
like
reaching
the
hour
rob
as
my
counterpart
in
this
talk.
C
How
are
you
feeling
you
you
good
to
keep
going,
I'm
good
to
keep
going
yeah
all
right?
Well
bob!
If,
if
you
need
to
kick
us
off,
you
just
let
us
know,
but
we've
got
a
little
bit
more
here
and
we'll
keep
pushing
until
someone
tells
us
we're
not
allowed
to
anymore
all
right.
C
So
this
is
an
especially
fun
one
which
let
me
close
out
some
of
my
tabs
here,
and
this
is
actually
I
can
see
my
my
phone
blowing
up
a
little
bit
with
some
messages
about
this,
because
it's
not
completely
resolved
yet.
But
I
chose
this
fail.
Well,
it's
really
more
than
just
failing
tests,
but
there
are
were
quite
a
lot
of
failing
tests,
and
if
this
is
very
recent,
so
you'll
probably
see
that
they're
still
there,
probably
on
1.20
blocking,
we
should
be
able
to
see
some
of
them.
C
Let's
go
back
a
bit
yep
all
right,
so
you'll
see
that
this
immediately
started
failing
here,
and
none
of
the
tests
were
actually
even
being
run.
So
if
we
click
on
one
of
these
you'll
see
the
extract
step
here,
which
is
basically
where
we
download
that
version
of
kubernetes,
based
on
based
on
the
version
marker
that
we
have
was
failing
right.
C
So
we
weren't
able
to
run
any
tests
because
we
weren't
even
able
to
get
a
kubernetes
cluster
to
test
against,
and
if
we
look
back
at
the
commits
here
number
one.
We
see
here
that
the
info
commit
didn't
change.
There
is
a
change
in
commit
hash
for
the
other
indicators
here,
but
it
went
to
missing.
So
this
isn't
really
helpful
right.
C
It
was
a
shorter
time
for
1.19,
which
we'll
see
why,
in
a
second,
this
basically
started
happening
across
all
different
branch
blocking
boards.
So
the
immediate
thing
I
thought
was,
you
know
when's
the
last
time
that
we
released
well.
We
just
had
a
1.20
release
of
the
1.20.0
release
and
it
was
about
the
same
time
that
the
jobs
on
1.20
started
failing
and
then
the
next
day
we
had
patch
releases
for
all
the
different
branches
and
that's
when
they
started
failing.
C
So
there
was
probably
something
wrong
with
that
release,
so
it
turned
out
that
we
had
some
faulty
logic
in
our
release,
tooling,
that
led
to
we
can
go
to
this
well,
I
should
have
just
clicked
on
all
releases
here.
C
You
can
see
that,
for
instance,
the
1.15
or
the
1.17.15
and
the
1.17.16
rc.0
are
on
the
same
commit
here,
and
what
we
want
to
do
is
actually
separate
those
commits
so
that
we're
able
to
determine
they
happened
at
different
times,
because
they
were
on
the
same,
commit
all
of
the
builds
that
were
happening,
which
we'll
do
this
for
1.20.
C
C
The
the
version
marker
that
we're
using
is
saying
it's
1.20.0-1,
plus
the
digest
of
the
hash
there
and
what
it
should
be
saying
is
rc
dash
whatever
or
rc.0-whatever
for
1.20.1
right
we're
moving
towards
the
next
release,
and
that
wasn't
happening
because
of
those
commits
being
on
the
same
hash
and
the
way
that
version
marker
gets.
There
is
when
this
build
job
runs.
C
Sorry,
we
have
to
build
these
versions
for
the
test
to
start
using
when
they
ran,
based
on
the
version
that
we
are
building,
we'll
say,
publish
extra
version
markers
so,
for
instance,
here
we're
saying,
publish
a
kate's
beta
version
marker
and
then
it
also
sees
that
we're
on
the
1.20
branch
here.
So
if
we
look
at
some
of
the
logs
here,
you'll
see
that
we're
publishing
extra
version
markers
and
then,
if
we
go
down
and
actually
look
at
the
copy,
let's
see
if
I
can
find
it.
C
Yep
so
here
you'll
see
that
we're
publishing
version
markers
latest
latest
one
latest
1.20
and
kate's
beta,
all
right
and
and
bob
says
we
do-
need
to
wrap
up
soon.
Yeah
I'll
basically
say
that
that
we
were
building
on
the
wrong
commit,
and
this
is
just
an
example.
The
reason
why
I
bring
this
one
up
is
this
an
example
of
something
that
had
nothing
to
do
with
bad
code
and
kubernetes
bad
tests
in
kubernetes
or
even
the
infrastructure
that
the
job
was
running
on.
C
There
was
an
issue
in
repo
machinery
there's
an
issue
in
our
release
process
that
calls
the
version
that
we
were
using
to
not
match
a
regex.
That
basically
said
this
is
an
appropriate
version
to
download
and
use
to
run
tests
against,
and
maybe
we
can
follow
up
with
a
separate
discussion
that
talks
about
all
the
different
layers
that
this
exposes
of
traversing.
You
know
from
code
in
the
release
repo
to
code
and
test
infrared,
a
code,
that's
in
kubernetes
kubernetes
to
download
and
extract
that
that
version.
C
But
this
is
an
example
of
how
difficult
it
can
be
to
find
why
a
test
has
started
failing.
So
definitely
always
ask
for
help
right,
because
there's
other
people
who
have
context
that
you
may
not
have-
and
the
last
thing
I
want
to
point
out
here
is
this-
is
an
example
of
a
pod
not
being
able
to
be
scheduled.
So
this
is
what
rob
was
talking
about
right
with
those
requests
and
limits.
C
It
basically
says
there
wasn't
an
available
node
due
to
either
insufficient
memory
or
taints
on
the
nodes,
inefficient,
cpu,
insufficient.
Sorry
cpu.
They
basically
said
we
weren't
even
able
to
attempt
to
run
this
job
and
that's
something
that-
and
I
think
that
this
one
I
did
here
yeah,
that's
something
that
you
may
want
to
open
on
test
infrared
to
say,
there's
an
issue
with
either
our
job
configuration
or
our
underlying.
B
Infrastructure
yeah-
and
I
think
it's
worth
saying
that
that
that
a
massive
massive
amount
of
work
and
effort
goes
into
keeping
everything
up
and
running
from
from
a
job
running
point
of
view,
and
it
is
a
it's
no
small
undertaking
and
and
the
team
to
keep
things
up
and
running
are
awesome.
You
know
they
do
great
work
and
when
there's
problems
you
know
they're
very,
very
responsive.
B
So
I
think
it
hasn't
failed
yeah,
so
I
think
we're
gonna.
I
think
we're
gonna,
maybe
finish
up
there
and
just
thank
you
dan.
For
for
for
all
of
that
good
info.
You
can
see
the
diff
you
can
see
how
far
ahead
dan
is
on
the
ci
signal
journey
than
I
am,
and
I.
B
B
Oh
but
you're
doing
that
you
do
the
fun
stuff
that
I
want
to
do
next.
You
know
so
so
just
like
to
thank
everyone
for
for
attending.
B
What
I
would
say
is
if,
if
we
want
to
follow
up
with
the
with
the
team,
you
can
reach
out
to
us
on
release
ci
signal
on
slack
on
kubernetes
slack,
and
if
you
have
any
further
questions
that
you
want
to
ask,
you
can
follow
up
with
those
questions
there
and,
broadly
speaking,
the
the
there
was
a
lot
of
interest
in
what
data
goes
where
and
how
long
that
we
can
see.
B
Data
for
and
josh
burkus
has
pointed
out
a
few
features
that
he'd
like
to
see
in
test
grid
and
I'd
say
to
josh
log
them
as
issues
on
the
test,
good
repo
and,
if
they're
doable.
You
know
they'll
get
done
in
time
and
just
thanks
to
everyone
for
the
for
the
questions
and
thanks
to
joyce
kung
for
for
providing
support
during
the
chat
as
well.
B
Joyce
is
going
to
be
the
new
going
to
be
the
team
lead
for
ci
signal
for
the
1.21
release
and
so
really
looking
forward
to
that
and
yeah.
I
think
we
can
wind
it
up
there
bobby
and
end
the
meeting
thanks
don.