►
From YouTube: Kubernetes SIG Node 20230822
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20230822-170502_Recording_640x360.mp4
A
Hello,
hello
today
is
August
22nd
2023.
It's
a
signaled
weekly
meeting
welcome
everybody.
We
have
a
very
short
agenda
today,
let's
dive
into
it.
My
first
is
a
question
from
maratha.
A
B
Yes,
so,
as
I
said
there
in
128,
we
had
the
support
for
stateful
parts,
which
is
running
spaces
previously,
when
we
supported
stateless
parts
and
I
was
wondering
if
we
can
make
our
blog
post
or
who
should
I
talk
to
to
make
that
happen.
Monroe
already
mentioned
that
seems
okay,
but
yeah
yeah.
A
So
this
is
a
call
for
a
blog
post
for
128
and
I.
Think
deadline
passed
a
long
time
ago,
but
at
the
same
time,
like
they
don't
mind
to
have
new
posts.
I
asked
somebody
from
sick
dogs
and
alternatives
ndpr
to
a
website
and
try
to
find
somebody
to
approve
it
from
sick
dogs.
I,
don't
think
anybody
will
mind
it's
still.
A
We
still
have
time
to
like,
because
release
blog
posts
are
kind
of
spaced
out,
so
we
don't
post
everything
at
a
single
day,
so
yeah
I
would
suggest
to
start
with
the
pr
like
have
a
short
PR.
If
you
want
to
discuss
text
or
post,
you
may
start
with
a
Google
doc,
it's
just
easier
to
collaborate,
and
once
you
have
it,
let's
find
somebody
from
sick
dogs
to
approve
it.
If
you
find
I
have
problems
troubles
to
find
somebody,
let
me
know
I
will
help
you
navigate.
B
A
E
Yeah
so
we
came
up.
I
did
a
doodle
poll
to
find
the
timing
for
the
image,
garbage
collection,
rework
working
group
and
that
would
be
on
Wednesdays
at
I,
guess,
11
or
no
12
Pacific.
E
So
I
was
wondering
if
we
could
add
that
to
the
signaled
calendar
I,
don't
think
I
have
permissions
to
do
that.
So
I
would
need
someone's
help.
E
Yes,
oh
shout
it
to
you
cool,
that's
all.
E
Absolutely
yeah
and
I'll
post
it
in
the
channel
and
in
the
email
list
as
well
I'm
thinking
about
just
having
a
preliminary
meeting
tomorrow,
probably
won't
be
very
busy
just
like
meet
and
kind
of
go
over
goals
and
overall,
maybe
some
desires
out
of
the
working
group.
So
it'll
be
probably
pretty
quick,
but
I
figure
get
the
ball
rolling,
but
I
will
post
that
where
appropriate.
Thank
you.
F
I'm,
sorry,
that
was
me
so
I
Smith
here
a
while
back
and
I
was
talking
with
Peter
on
the
issue,
I
sort
of
like
a
philosophical
like
where's,
the
responsibility
law
between
no
problem,
detector
and
cubelet
with
respect
to
detecting
some
issues.
F
So
the
root
cause
of
this
was
I've
seen
a
few
times
where
file
systems
can
go
read
only
not
just
for
like
physical
failures
but
like
xfs
is
a
really
weird
issue
where
you
can
no
longer
write
files,
even
though
you
apparently
have
plenty
of
free
space
and
plenty
of
I
notes.
F
If
you
get
excessive
free
space
fragmentation
and
what
occurs
when
that
happens,
that
cubelet
basically
stays
ready,
everything's
great
except
all
your
pods
are
failing,
because
they
can't
actually
write
to
any
file
systems
whatsoever,
keep
the
camera
at
the
file
systems,
and
so
it's
just
a
pretty
bad
failure
scenario
for
users.
So
when
this
change
I
was
looking
at
adding
something
to
all
right,
cubelet,
just
right
to
the
Pod
data
directory
every
few
seconds.
If
that
fails,
then
something
has
gone
massively
wrong.
We
can
mark
the
note.
F
And
resolve
it
and
then
Peter
was
saying:
well
maybe
it's
more
a
responsibility
of
like
no
problem
detector
which
yeah,
maybe
maybe
it
is
I'm,
not
I'm,
really
just
looking
for
guidance
like
what's
the
line
between
detecting
issues
internally
with
cubelet
versus
no
problem
detector.
G
Maybe
I
can
answer
this
question
so
I
think
both
right
so
for
the
for
this
one
kubernetes
cannot
perform.
Normally,
there
are
no
more
behaviors
right,
so
they
should
Mark
itself
and
in
our
system
when
we
design.
Actually
it
is
the
kubernet,
both
kubernetes
I,
don't
know
the
problem.
Detector
both
can
Mark
know
the
notoriety
so
kubernetes
when
they
Mark
is
just
based
on
the
prerequisite
we
could
make
the
can
operate
normally
and.
G
Next,
the
first
system
to
read
only
a
lot
of
times
close
by
negative,
for
example,
room
boom
like
the
system
and
the
will
be
next.
The
kernel
try.
G
Totally
attach
the
whole
system
in
the
milk
will
be
turned
around.
This
is
the
file
system
is
read
only
right,
so
those
kind
of
problems
could
be
detected
by
no
Department
detector
and
just
through
those
kind
of
things
right.
So
it
could
be
not
nothing
at
this
moment
not
realness
itself
cannot
write
in
certain
cases
right.
So
there's
no
new,
but
the
job
cannot
also
cannot
access
cannot
write
the
data.
G
So
this
is
why
those
cases
know
the
problem
is
supposed
to
detect
those
problems,
but
the
one
challenge
I
know
it
is
today
we
didn't
really
actually
update
of
the
node
problem.
Detector
kernel
evolved,
for
example.
We
already
started
to
switch
switch
to
support
of
the
signal
V2.
So
there's
a
lot
of
problem,
because
no
problem
database
are
the
previous
production.
What
we
have
the
kubernetes
right
after
this
eight
nine
years,
a
lot
of
things
has
changed
so
today,
I
think
no,
the
problem
detected
did
a
little
bit
more
work.
G
F
G
A
Mm-Hmm
yeah
one
question
will
be
how
to
recover
from
the
state
like
if
Google
detects
the
prerequisites
and
it
marks
itself,
is
not
like
not
ready
or
unhealthy
how
it
will
get
back
if
a
system
respond
itself.
F
So
this
the
situation
that
I've
seen
there
is
no
great
way
to
recover
other
than
roll.
Your
node
and
start.
G
That's
the
once
this
is
by
the
states.
You
just
have
to
reboot
the
note.
So
this
is
why
we
only
reported
the
status
and
then
the
job
supposedly
should
be
rescheduled
and
the
node
should
be
recovered
by
reboot.
So
that's
kind
of
this
is
why
we
build
this
Auto
Repair.
But
the
auto
repair
is
not
done.
I,
don't
know
the
level
you
can
stand
on
the
global
level
because
you
need
to
make
sure
Sky,
tuner,
API
server
knows
that
node
is
not
ready
and
the
work
should
be
rescheduled
to
the
other
yeah.
G
G
F
So
the
the
actual
issue
that
I
saw
was
not
one
where
the
so
it's
not
a
mounted
read-only
file
system.
It's
just
a
file
system.
You
can
no
longer
write
to.
A
Sorry,
thank
you.
Hey
keep
my
comment
active
for
general
rule
like
Uncle
General.
If
you
check
for
predict,
which
is
on
Google,
we
need
to
make
sure
that
it's
either
unrecoverable
for
sure
and
we've
like
100
sure
about
it,
or
there
is
a
way
to
recover,
offset
and
get
back
into
account.
State.
G
A
Going
once
white
rice,
thank
you,
everybody
short
and
useful
meeting.
Let's
start
at
129
strong,
but
also
don't
forget,
to
take
your
rest
of
your
vacation
and
enjoy
the
rest
of
the
summer.