►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
hi
to
our
CI
hacking
session
today,
I'm
gonna
try
to
run
this
and
what
we're
going
to
look
at
is
first
of
all
at
this
issue.
If
we
have
for
the
time
at
the
end,
we
could
maybe
just
pick
another
one
from
try
it
and
take
a
look
yeah.
So,
let's
take
a
look
at
this
issue.
Korean
did
open
this
issue
some
a
month
ago
or
around
a
month
ago,
and
let's
take
a
look,
what
it
is
about.
A
A
I
did
already
some
research
there,
so
let's
maybe
take
a
look.
First,
a
trio
so
I
have
a
filter
here
which
just
shows
to
us.
Okay.
Yes,
this
flag
is
still
relevant.
It
happened
two
times
in
the
last
day,
a
normal
CI.
So
it
wasn't.
B
A
Still
makes
sense
to
do
it:
okay,
yeah.
A
What
we
have
is:
okay,
we
have
this
log
message
and
lots
of
jobs
and
artifacts
where
we
could
take
a
look
at
what
I
also
did
already
is
I
tried
to
reproduce
the
issue,
so
we
can
yeah,
maybe
actually
try
and
fix
the
issue
by
yeah,
see
or
find
our
ways
how
to
further
go
down
on
this
issue.
So
what
I
did
for
reproducing
it
is.
A
Let
me
open
that
in
the
editor
at
just
put
a
for
loop
around
the
test,
so
it
gets
run
more
often
in
CI,
and
this
resulted
in
yeah
helping
to
reproduce
the
issue.
So
all
I
did
I
did
go
into
this
test,
which
is
the
cluster
upgrade
runtime
SDK
test
yeah,
which
installs
the
cluster
and
upgrades
it,
and
this
all
includes
a
runtime
SDK
extension
which
records
or
tries
to
simulate
that
or
it
doesn't
simulate.
A
It
actually
uses
a
test
extension
which
then
records
that
the
hooks
got
executed
into
a
config
map,
and
these
tests
make
sure
that
this
hooks
got
called
and
everything
got
got
done
properly
and
what
I
did
was
I
just
did
put
around
a
for
loop
around
the
interesting
part
of
the
test,
which
is
where
the
cluster
gets
created.
It
just
gets
up,
upgraded
and
so
on.
Yeah.
C
Just
for
people
who
aren't
aware
so
runtime
SDK
is
a
new
feature
that
was
added
in
kind
of
the
1.3.
B
C
Cycle
but
it
the
test
extension
is
an
external
component
completely.
That
runs
alongside
the
other
management
components
in
the
cluster,
and
basically,
our
controller
is
just
can
call
out
the
test
extension
during
its
life
cycle,
and
this
is
just
testing
that
that
works
and
the
way
it
does
so
which
I
guess
from
senior
manager
is
a
record
state
in
a
config
map
which
the
test
then
checks.
A
Yeah,
basically,
that's
the
it's
also
documented
in
a
book
here
things
clear.
There
are
some
hooks.
We
can
execute
like
the
before
class
to
create
hook
the
after
control
plane,
initialize
took
before
cluster
upgrade
and
so
on.
So
there
are
multiple
places
where
the
runtime
SDK
or
you
could
Define
a
hook
which
should
get
executed
for
your
clusters
during
life
cycle
yep
to
jump
back.
A
So
all
I
did
was
adding
this
for
Loop
here,
which
closes
here
somewhere
at
the
end
and
I
had
to
work
around
some
some
stuff
that
basically
cleans
up
again
in
the
next
for
Loop,
which
enters
it
does
also
work.
What
I
also
did
here
is
I
adjusted
the
call
to
dump
and
delete
cluster
here
to
bring
all
artifacts
to
a
different
directory.
Otherwise
the
cluster
would
have
always
the
same
name
and
the
artifacts
would
just
get
overwritten.
A
A
I
Implement
implemented
that
it
took
lots
of
trial
and
error
until
I
reached
that
state,
because
there
were
different
ways
how
you
could
run
that
test
multiple
times,
but
this
one,
this
current
state
of
the
pr
is
the
one
I
think
reached
a
working,
State
and
I
did
run
off
from
the
slash
test,
pool
cluster
API
end-to-end,
informing
main,
which
executes
this
test.
A
And
finally,
when
I
take
a
look
at
the
latest,
one
I
started
so
yesterday
it
just
started
the
the
test
again
and
if
we
look
into
it
we
should
have
exactly
hit
the
issue
again.
So
we
can
see
okay
here
the
issue
got
hit
and
yeah,
so
we're
able
to
reproduce
it
in
CI.
Using
this,
this
pull
request,
but
I
also
tried
after
knowing
that
we
can
reproduce
it
using
this
pull
request.
A
A
I
also
already
did
some
a
little
bit
of
analysis
of
failed
paid
shops.
This
was
workshops
in
April
about
a
month
ago
already
and
I
was
able
to
try
HDM
a
bit
from
what
we
have
there
in
the
artifacts
of
of
these
tests.
A
So
what
what
I
was
able
to
see
is
there
were
three
times
I
did
see
that
at
the
cluster
object,
the
condition
for
topology
reconcile
was
said
to
fail
with
the
message
Handler
has
not
been
registered
and
the
other
times
were
a
different
case.
So
I
did
more
of
it.
Inspection
to
this
one
and
to
see
okay
did
some
analysis
and
try
to
see
if
we
got
further
information
there,
but
maybe
we
just
go
down
that
road
with
an
example
to
see
where
we
are
yeah.
A
A
D
C
A
So
I
did
check
the
config
map,
but
that
looked
totally
fine.
So
as.
A
In
there
at
a
different
place,
so
maybe
we
just
take
a
look
and
dig
into
one
example
and
to
see
yeah
to
see
all
data.
A
A
This
whole
directory,
so
we
can
browse
it
using
the
IDE,
which
is
way
more
easy,
so
I
just
get
that
whole
oops
set
that
variable
to
that
URL
and
I
already
have
that
Pearl
somewhere
here
or
not
I
have
to
complete
from
snack.
D
Yeah,
so
so
one
question
to
get
this
right
in
my
head:
runtime
SDK
is
another
controller
like
Christian,
you
know,
or
copy
extension
controller
that
runs
on
the
kubernetes
management
class
or
copy
management
cluster
right.
A
Yeah,
it's
a
deployment,
in
our
case,
it's
a
deployment
which
is
there,
oh,
so
we
also
have
it
in
here
for
our
tests.
D
A
A
C
Sometimes
setting
the
limits
in
deployments
as
well,
her
deployment
has
Reverse
Impact,
because
previously
it's
not
guaranteed
and
we
don't
set
the
we
don't
set
the
limits
for
other
controllers
in
copy
for
these
tests.
So.
C
Don't
have
any
limits
on
any
road
deployments,
okay,
so
setting
the
limits
on
the
test
extension,
even
if
it's
very
low
I
mean
above
10
millise
or
something
could
actually
have
the
impact
of
it
being
more
stable
and
having
access
to
more
CPU.
If
it's
the
fact
that
another
controller
is
actually
eating
up,
Cycles
yeah.
A
A
Yeah,
so
the
test
extension
itself
lives
here
in
test,
slash
extension
and
it's
basically
a
reference
implementation
of
yeah,
some
basic
hook,
which
does
nothing
more
than
right
into
the
response
to
a
config
map,
responding
with
a
predefined
response
from
the
config
map.
A
I
hope
the
sizes
is
okay
for
introspection.
So
taking
another
look
at
the
Lock
here
where
the
issue
happens.
Okay,
that's
actually
exactly
the
message,
so
the
call
is
not
recorded
in
a
config
map
which
is
in
this
namespace
and
called
like
this.
So
maybe
just
take
a
first
look
into
into
that
conflict
map
how
it
looks
like
now.
We
have
to
find
where
it
is
so
it's
in
cluster
in
our
bootstrap
cluster,
because
the
test
extension
all
this
happens
in
the
management
cluster,
which
is
the
bootstrap
Blaster.
A
So
let's
take
a
look
at
the
bootstrap
cluster
resources
inside
our
namespace
there's
the
config
map
and
let's
take
a
look
at
the
hook
responses
here
and
the
interesting
part
should
be
the
data
part
here
which
has
the
pre-loaded
responses,
and
we
can
actually
ignore
that
preload
responses,
if
I'm
right,
because
that's
just
the
answer
which
gets
returned
right
clear
if
I'm
right,
yeah.
C
A
And
so
if
we
take
a
look,
the
before
cluster
upgrade
one
didn't
work
so,
but
actually
it
has
recorded
it
this
correctly.
So.
C
So
the
reason
so,
if
we
go,
can
you
go
to
the
code
where
the
test
is
actually
failing?
I
think
it's
four
seven,
six
line,
four,
seven
six
indeed,
runtime
SDK
upgrade
test.
D
A
We
have
some
local
changes
here,
so
I
just
stash
down
so
but
I
should
be
on
the
branch,
though
all
right,
so
it's
in
here
where
it
fails
or
this
timeout
gets
hit,
which
is
why.
C
So
one
issue
we
have
I
think
so.
Can
you
just
check
the
stack
of
the
error
again,
just
to
make
sure
that
this
is
where
the
air
is
getting
returned
from
our
Omega
might
modify
that
the
way
that's
annoying
376.
A
C
Yeah,
just
because
I
know
that
error
gets
called
in
two
places.
B
C
A
A
C
B
C
I
guess
because
they
need
to
be
on
Merchants,
just
there's,
probably
a
bug
there
and
how
we're
marshalling
it
on
Marshland.
But
that
should
be
fine
in
this
case.
A
Yeah,
so
just
but
it's
also
matched
to
like
before
class,
to
create
yeah,
which
also
has
state
of
success
this
and
this
the
actual
response,
so.
C
A
Just
get
the
whole
config
map
Loop
over
yeah,
ignore
this
full
printing
stuff.
Here
it
just
returns
to
conflictmap.data
stuff
and
then
extracts
the
data
or
takes
a
look.
If
the
data
is
in
there,
if
there
is
that
there
just
needs
to
be
hook,
name
Dash
actual
response
status
there,
it
doesn't
matter
what
the
value
is
from
this.
B
A
It's
just
bad
that
it
does
printed
before
all
the
test
output,
but
basically
the
last
one
should
be
the
last
one.
The
last
time
it
entered
that
place
so
yeah
it
exactly
prints
that
out.
A
C
Mean
yeah
so
and
you
go
back
to
your
foods
and
check
the
times
yeah.
So
one
issue
that
we
run
into
constantly
with
flakes
is
controller
runtime
user
cache
for
objects,
I'm,
not
sure
for
caching,
config
Maps.
Possibly
not
do
you
know
Christian.
Sorry
again,
are
we
caching
config
maps
in
controller
runtime?
Probably
right,
we
don't
catch
Secrets,
probably.
A
C
So
that
means
that,
because
we're
catching
them
this
to
prevent
us
like
constantly
storming
the
API
server
for
information,
everything
we
have
is
always
slightly
out
of
debt
most
the
time,
there's
no
change
between
what
we
have
and
what
the
API
server
has,
but
there
is
always
a
risk
of
us
getting
something
in
the
100
milliseconds
or
whatever.
It
is
like
what
I
said
earlier.
C
Cpu
constraints
can
make
that
really
important,
because
the
cash
might
not
date
as
quickly
if
we
have
constraints.
So
maybe
it's
the
cash
so
sorry
question
what
I'm
looking
for
is.
Were
you
with
the
foo
logs.
B
A
So
that's
what
I
copied
over
here
now
to
this
file,
so
I
confirmed
it
and
there's
also
the
managed
fields
in
here
fields
and
the
managed
Fields
can
tell
us
which
parts
of
the
object
did
get
updated
when
the
last
time-
and
this
here
says
okay,
this
fields,
which
includes
the
which
doesn't
include
the
before
cluster
upgrade
actual
response,
was,
lastly
updated
at
1751
and
34
seconds,
which
basically
is
then
before
the
time
the
test
failed
or
the
last
time
it
was
updated
before
the
test
failed.
A
B
B
C
B
A
C
Yeah,
can
we
look
at
the
Happy
controller
manager
logs
just
see
if
there's
anything
art
happening,
because
the
reason
I'm
very
interested
in
this
so
like
we
probably
have
a
timeout
I'm?
Maybe
if
we
just
set
the
wait
period
in
our
test
of
five
minutes,
this
flake
will
never
happen
again,
which
will
be
a
win
for
us,
but
if
we
can
patch
an
issue,
that's
going
on
in
controller
manager,
the
core
Capital
controller
manager.
B
A
I
just
want
to
place
it
somewhere
nice
because
that's
Jason
Logan.
It
has
not
the
perfect
timestamp
in
there.
So.
C
C
Oh
awesome,
a
new
post
that
query
somewhere,
please.
A
B
C
A
B
B
C
When
you
just
dump
the
thing
without
your
JQ
query,
just
to
see
if
they're
the
same.
B
A
A
A
B
B
A
Okay,
you
may
want
to
filter
them.
Some
more.
A
C
A
Sorry
for
fussy
things,
so
any
of.
B
A
C
C
C
C
Where
are
we
exiting
that's
freaking,
sound,
pretty
often
I,
guess
waiting
for
the
kubernetes
note
on
the
machine
to
report
any
stage.
C
Those
are
bolts
positive
at
this
point
and
then
there
is
a
before
cross
drop.
Red,
Hook
I
think
that's
the
order
in
which
they're
called
so
what's
happening
here.
I
guess
critically,
the
machine
pools
seem
to
be
still
rolling
out.
C
C
A
A
Everybody
is
still
on
pretty
old.
C
But
sort
of
what
I
was
interested
in
was
a
PR.
It
was
from
like
a
month
ago.
Okay,
which
is
in
machine
pools,
they
added
a
node
water
to
machine
pools
yeah.
So
this
should
be
fine
for
that
issue.
A
A
B
A
C
A
A
Right
but
I
had
other
cases
where
this
was
not
the
state
of
the
cluster.
At
that
point,
I
also
had
some.
A
B
C
Yeah,
but
because
if
it's
the
K,
if
all
of
our
cases
look
like
what
we're
looking
at
now,
we
try
to
run
registration.
C
C
That's
probably
the
case
in
this
one
yeah.
It
could
also
be
the
case,
of
course,
because
we
don't
have
health
checks
and
stuff
on
our
no,
but
has
not
been
registered,
means
that
the
extension
config
doesn't
exist
basically
or
hasn't
been
read
yet,
which
in
that
case,
we
probably
already
called
it
earlier
in
the
test
right
because
we're
failing
at
the
upgrade,
so
we
would
have
to
have
had
in
the
case
we're
looking
at.
C
So
it
this
definitely
happens
before
we
have
any
rollouts,
but
our
machine
goes
aren't
rolled
out
yet.
B
A
C
Yeah
exactly
so,
it's
kind
of
no
freeway
for
control
plane
to
be
upgraded.
Yeah,
yeah.
C
C
B
A
Maybe
we
should
take
a
look,
but
whenever
the
timestamp
before.
B
C
A
B
B
B
C
We
won't
upgrade
if
the
machines
aren't
ready.
Okay,
that's
definitely
through
the
control
plan,
but
yeah
I'm
gonna
have
to
drop
at
the
hour,
but.
A
But
I
think
we
don't
have
enough
information
in
the
logs
here
it
just
it
just
says:
okay,
we're
waiting
for
the
node
on
the
machine
to
report
ready
state
but
yeah.
The
theory
currently
is
the
machine,
the
node.
Still
it
doesn't
get
ready
or
do
not
get
ready,
which
is
why
it
doesn't
reach
the
stadia
to
call
the
hook.
A
A
C
Yeah
so
I
guess
number
one
thing
here
is:
can
we
figure
out
where
the
technology,
if,
if
it's
the
topology
reconciler
that
is
exiting
because
it
doesn't
want
to
upgrade
before
machines,
are
ready?
Why
isn't
that
being
logged
somewhere
or
maybe
it
is
being
logged
somewhere
or
just
missing
it
in
the
silver
logs?
But
if
it
isn't
being
logged,
it
should
be
that's
a
key
kind
of
decision
that
we're
making.
A
B
C
Okay,
I'm
gonna
drop,
but
thanks
a
lot
for
doing
this
question.
This
was
I.
Think
Super,
useful
and
I
feel
like
I
feel
like
we
could
solve
this
right
now
by
just
adding
longer
timeouts
because
of
what
we've
seen
which
we
actually
had
in
the
test
and
I
pulled
them
down
at
some
point,
but
I
think
maybe
we
could
actually
get
to
the
the
root
of
a
more
serious
problem
here.
Maybe
if
we
keep
looking
yeah.
C
Yeah
I'd
be
happy
with
that
as
a
to
to
do
both
if
you
want
to
open
a
pull
request,
I'm,
not
sure
what
time
that
is
right
now.
Is
it
five
minutes?
It's
30
seconds.
It's
only
30
seconds
for
that
entire
Handler.
C
C
As
long
as
we
record
in
a
separate
well,
we
can
continue
on
the
on
that
issue
and
maybe
change
the
title
of
the
issue
or
whatever.
But.