►
From YouTube: Kubernetes SIG Node 20210126
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
B
All
right
how
you
started
oh
recording
started.
So
did
you
say
it's
a
january
26th
hi,
signor
meeting
so
yeah?
I
think
we
are
in
this
stage
of
development
and
the
version
release
when
there
are
a
bunch
of
work
in
progress
prs
right
now.
B
If
you
look
at
created
prs,
you
can
see
quite
a
few
of
them
marked
with
working
programs
so
used
to
be
marked
with
work
in
progress,
so
this
race
in
pr
count
may
be
expected,
but
still
there
are
few
pr's
and
we
identified
a
lot
of
prs
and
cia
test
groups
that
needs
to
be
reviewed
and
moved
forward,
they're
quite
straightforward,
so
yeah.
B
If
you
have
time
please
review
prs
but-
and
I
don't
feel
very
bad
about
this-
plus
seven
on
pr
count,
because
just
in
this
stage
like
people
working
on
features,
so
this
is
great.
C
Yeah
I'll
jump
in
and
say
so
yesterday,
in
sort
of
the
second
half
of
the
ci
meeting
we
sort
of
met
and
discussed.
You
know
we
have
the
the
ci
sort
of
test
specific
board
and
then
we
have
the
the
board,
which
is
everything
else
and
sergey,
is
mostly
managing
the
task
board
and
I'm
managing
the
everything
else
board.
C
So
from
that
meeting
we
came
away
with
like
a
few
actions,
so
action
number
one
was
for
me
to
put
a
little
like
documenting
note
at
the
top
of
each
column,
on
the
non-test
board,
to
sort
of
explain
what
things
should
go
in
each
column,
so
other
people
can
jump
in
and
participate
and
as
well
to
add
a
sort
of
more
longer
form
document.
C
So
I
haven't
had
a
chance
to
draft
that
document
yet,
but
that
should
be
coming
and
I
think
the
other
thing
I
should
mention
is
that
I
think
we
want
to
maybe
move
the
node
meeting
that
the
ci
meeting,
which
is
currently
on
mondays,
because
it's
kind
of
at
a
bad
time
for
some
people.
So
I
think
sergey
is
going
to
send
a
doodle.
A
A
D
Yes,
hi
so
couple
of
weeks
back,
I
didn't.
D
Meeting
so
I'm
not
exactly
sure
what
was
discussed
in
this
note
further,
I
know
like
I
was.
I
was
looking
into
the
issue
which
caused
this
revert
to
happen.
D
I
looked
into
the
h2
logs
and
the
ci
logs,
and
I
saw
that
it's
yep,
so
this
fix
will
actually
cause
the
part
delete
to
be
slowed
down,
which
is
expected
and
the
reason
I
even
mentioned
that
in
the
pr
comments
that,
after
this
fix
like
you,
should
see
a
little
bit
more
time
a
little
bit
more
time
from
the
time
you
delete
the
part
to
the
time
it
is
actually
removed
from
the
api
server,
because
in
the
current
scenario
in
the
current
world,
like
what's
happening
is
not
everything
is
being
cleaned
up
and
few
things
are
being
just
left
as
a
gc
to
take
care,
and
those
things
actually
include
something
important
like
a
network
resources
and
other
stuff.
D
The
cluster
means
that
they
would
know
like
beforehand
like
okay,
why
there
are
so
many
parts
which
are
like
actually
asked
to
be
killed,
but,
like
still
not
yet
fully
terminated,
with
some
network
issue
happening
on
some
notes
or
something
those
kind
of
issues
would
be
caught
straight
away
itself.
D
So
I
understand
that
it
was
reverted
because
it's
causing
like
some
p1
issue
in
the
cics
there,
so
I
I
want
to
like
since
I
was
not
part
of
the
other
sick
node
meter.
I
just
want
to
let's
see
what
was
decided
on
this
year.
That
way,
I
can
try
to
introduce
like
look
into
this
code
more
and
fix,
if
whatever
is
required
and
create
a
new
pr
back
with
that
with
that
fix.
B
B
Some
context
on
investigation-
originally,
it
was
proposed
once
long
ago
and
it
was
implemented,
but
then
reverted
back
once
already.
As
far
as
I
remember,
because
because
of
delayed
introduced
for
a
port
shutdown,
and
then
there
was
a
pr
that
attempted
to
fix
it
and
it
was
fixing
it
in
most
cases.
So
the
idea
was
that
when
we
receive
this
lifecycle
event
that
port
being
removed
from
or
continue
being
removed
from
a
port,
we
will
try
to
clean
up
the
sandbox
and
the
it
was
working
in
most
cases.
B
But
sometimes
it
was
raised
and
at
this
stage
like
port,
is
not
ready
to
be
removed
and
we
failed
to
remove
sandbox
and
waste
waiting
for
gc
to
pick
up
the
slack.
So
it
was
like
up
to
one
minute
to
start
a
deletion
process
of
some
boxes
and
then,
depending
on
how
many
sandbox
you
have
it
may
take
a
while.
So
like
one
minute
extra
time
of
port
cleanup
is
not
acceptable.
B
Many
tests
will
fail
and
like
it's
not
super
good
experience,
that's
why
we
reverted
it,
and
but
I
mean
I
think,
it's
reasonable
to
to
want
to
clean
up
all
the
resources.
I
don't
know
why
not.
I
just
need
to
make
sure
that
we
clean
up
it
as
fast
as
possible
and
not
waiting
for
gc
to
kick
in.
D
Yeah,
so
yes,
when
the
first
reboot
happened,
it
was
mostly
the
the
kubernetes
actual
etvs
are
doing
the
released
validation
test,
which
were
taking
more
time.
So
I
was
running
those
things
to
verify
my
fix.
I
think
now
it's
happening
with
the
on
the
history
side,
where
they
use,
I
think,
kind.
In
that
network
scenario,
I
can
like
try
to
simulate
that
and
see
where
actually,
the
exact
deal
is
happening
and
try
to
fix
on
that
part,
okay,
sure
I'll
I'll.
D
Look
into
that
exactly
the
issue
which
failed,
which
caused
this
thing
to
be
divided
and
I'll.
Try
to
see
where
actually
happened.
Thanks
thanks,
eddie.
B
Yes,
I
think
elana
mentioned
some
failures
on
cryo
as
well.
That
was
happening
because
of
that.
A
I
also
want
to
mention
that
there's
the
one
also
there's
the
e2e
test
make
sure
it
is
test
the
power
to
remove
deletion
and
it's
correct
those
past
and
another
thing:
it
is
the
gel
from
the
steel.
They
have
like
the
very
easy
reproducing
cases
for
that
issue,
so
make
sure
that
reproducing
cases
is
covered
underneath
the
to
me,
that's
represent
the
somehow
that
they
try
to
integrate
so
so
how
they
are
using
so
make
sure
that
also
passed.
A
I
also
want
to
mention
that
I
just
noticed
that
on
the
agenda,
the
the
paxo-
I
I
don't
know
pectoral
shear
and
also
another,
and
he
also
have
a
proposal
that
the
pr
try
to
fix,
based
on
what
you
propose
the
business
peer
and
approvals,
but
the
decision
it
is
not
to
take
time
because
it
is
well
rushed
to
remiss
right
so
the
decision
it
is
reward
because
not
try
to
stop
the
regression
immediately,
so
that's
character.
A
D
Sure
yep,
it's
the
only
delay
here
would
be
mostly
because
of
the
cni
thing
I
just
want
do
you
know
like
what
was
the
cni
being
used
in
in
that
in
that
e2e
scenario,.
B
It
was
like
cni
wasn't
taking
too
much
too
long
over
time.
It
was
an
issue
that
the
logic
that
attempt
to
delete
a
sandbox
was
called
when
it's
too
early
to
delete
the
same
box
and
because
it's
too
early,
it
didn't
delete
the
sandbox
and
waited
for
garbage
collection.
So
delay
was
introduced
by
garbage
collection,
not
by
cni
or
any
runtime
interface.
A
F
Okay,
thanks
doll,
I
don't
think
I
need
to
share.
I
will
just
talk
about
this,
so
we
are
working
on
this
boiling
house
feature
we're
trying
to
move
to
beta.
So
I
have
the
link
there.
F
So
one
question
that
comes
up
during
the
review
of
the
cup
is
that
so
for
this
feature
right,
so
we
have
this
volume
house
monitoring
agent
deployed
as
a
side
card
with
the
css
driver
on
every
node,
and
each
agent
has
a
part
informer
and
that
well,
the
in
the
agent
will
add
an
event
to
the
pod
if
an
abnormal
body
condition
is
detected.
F
So
I
think
the
question
is:
will
there
be
any
scalability
concern
on
having
a
port
informer
for
each
agent
on
the
node?
Does
the
signal
have
any
suggestions
on
how
to
how
to
monitor
this
is
any?
Do
you
guys
have
any
like
common
framework
that
we
can
leverage,
or
is
this
just
it's?
Okay,
that
we
just
have
an
agent
each
agent,
keeper
party,
informer.
G
Question
this
is
derek.
I
appreciate
you
reaching
out,
I'm
not
as
familiar
with
the
implementation
of
this.
I
guess,
but
I
guess
the
first
question
I
would
have
is:
is
this
informer
listing
and
watching
pods
only
bound
to
the
node
where
that
agent
is
running
or
are
you
list
watching
all
pods.
G
Is
it
are
the
rights
for
this
driver
equivalent
to
like
say
what
the
node
restriction
admission
plug-in
enforces.
G
Maybe
another
way
to
think
about.
That
is
the
reason
I
ask
is
we
did
hit
scale
issues
on
the
cube
in
the
past
that
I'm
sure
dawn
can
talk
to
around
like
the
number
of
resources
that
are
watched
as
the
scale
of
the
cluster
grows,
and
we
had
done
a
number
of
work
to
kind
of
bound,
both
the
access
rights
of
each
cubelet,
to
read
that
information
and
then
when
and
how
it
chooses
to
refresh
that
information.
G
So
you'll
see
code
on
the
cube
around
like
the
secret
manager
and
config
map
manager.
That
like
ensures
it
can
only
read
the
things
associated
with
pods
that
are
bound
to
that
node
and
then
the
identity
of
the
cubelet
is
looked
at.
When
calling
the
api
server
to
say,
can
I
list
pods?
So
it
only
restricts
you
to
be
able
to
see
the
things
that
map
that
identity
or
that
bounding
node.
So
like
the
first
like
scale,
concern
question
would
just
be
like
ensuring.
G
Are
you
watching
more
than
what
is
needed
and
then
there's
probably
like
a
a
secondary
question
on
like
how
privileges
are
scoped
to
whatever
monitoring
solution?
This
is
that
keeps
it
maybe
as
restricted
as
say,
the
cubelet
identity
is
restricted,
but
those
are
the
first
things
that
come
to
mind.
I
don't
know
don
if
others
come
to
your
mind,.
A
I
think
that's
the
powerful,
the
topple
concern
we
have
and,
and
also
we
have
another
one.
It
is
because
each
agent-
and
then
they
have
their
own
monitoring
and
each
other
monitoring
without
the
awful
well
basically
it
is
take
over.
This
is
more
to
negotiate.
It's
not
open
source
cursor
and
for
the
production.
If
the
tke
will
deploy
some
agent
and
using
a
lot
of
customers
resource,
I
will
make
a
big
fuss
out
there.
A
A
So
so
that's
why,
on
the
open
source
reset
requirement,
I
just
say
oh,
please
make
sure
you're
not
using
too
much
resource.
I
just
want
to
see
that
in
the
past
the
pneumatic
is
the
same
eye
deployed
and,
and
all
that
know
the
problem
detector
deployed
and
even
at
the
serial
in
the
past,
the
linkedin
really
using
too
much
resources.
I
will
basically
just
push
back.
I
said
no
sorry
even
you
maybe
provide
the
good
functionality.
We
need
to
redo
a
lot
of
work
make
sure.
A
We
don't
have
those
documentation,
but
I
maybe
dig
into
the
really
old
of
the
know,
the
problem,
detector
and
and
because,
when
they
first
we
being
raised
those
problems
because
northbound
they
can
actually
have
the
both
problem
and
the
way
is
the
skinny
build
problem
initially
and
another
one
is
just
many
times:
it's
not
the
first
time
I
have
the
resource,
consumption
issues
and
resource
consumption.
A
They
had
like
the
cpu
memory
and
also
even
some
discard,
usually
in
the
past,
give
us
and
also
scalability
problem,
and
also
so
also
does
tag
of
the
note
because
sent
too
many
information
back
to
the
api
server
and
all
those
kind
of
things
using
other.
So
we'd
be
so,
but
we
don't
have
like
one
dark
capture
all
of
those
things
I
just
have
to
dig
into
some
old
design,
dock
and
or
maybe
some
old
bag.
F
Okay,
so
there
were
some
proposals,
for
I
think
it's
between
signaled
and
seek
instrumentation
on
how
to
monitor
node
components
is,
is
that
is
that
complete
or
is
it
some?
I
just
want
to
know
if
that's
something
that
we
can
use
as
well.
Are
there
any
architecture
there
that
we
can
leverage?
This
is
maybe
more
about
monitoring
like
the
magic
side,
I'm
guessing
seeking
instrumentation
right.
So
do
you
have
any?
A
That's
a
good
question.
Actually
I,
if
I
remember
correctly
that
how
to
monitor
know
the
component
that's
made
a
couple
years
ago
by
the
vishnu
and
actually
initially
they
found
a
signal,
but
do
we
want
to
partner
with
sick
instrumentation?
If
I,
if
I,
if
we
are
talking
about
the
same
also,
then
that's,
I
believe,
that's
the
richness
make
that
didn't
that's
totally
in
the
past.
At
this
moment
and
and
unfortunately,
only
internally,
we
discussed-
and
he
even
didn't
made
into
the
signal
to
discuss
that.
A
But
but
I
did
talk
to
him
about
the
internal.
We
do
the
review.
I
like
the
proposal,
but
just
someone
have
to
pick
up
that
work
and
make
progress
on
that
one
and
also
need
to
came
to
the
signal
and
and
make
progress
here,
got
the
approval
here.
F
A
So
yeah,
but
in
some
of
the
products
I
can
share
the
gke
and
the
several
production.
What
I
understand
it
is
people
build
some
plugin
on
the
load
problem,
detector
and
and
know
the
dedicated
node
detector
marketing
some
of
the
per
node
demon.
So
because
the
reason
I
know
is
because
I
made
this
suggestion
also
at
the
signal
and
also
made
it
into
the
gke,
but
we
haven't
make
that
progress
yet.
But
I
know
someone
approached
me
so
they
already
deployed
those
things
plugged
in
in
their
production.
A
So
that's
why
I
know
so.
We
can
talk
about
more
those
kind
of
things
I
didn't
suggest
them
to
upstream
those
things
and
the
suggestions.
And
but
I
have
to
say
that
at
the
signal.
G
Here,
just
to
maybe
clarify
my
own
understanding,
so
we
have
made
progress
within
the
sig
to
distribute
some
sources
of
metrics
that
used
to
be
coming
from
the
cubelet
to
allow
it
to
become
from
third-party
sources.
So
a
gpu
metrics
would
be
a
good
example
of
something
that,
like
as
a
trend,
we're
trying
to
allow
people
to
own
their
monitoring
unique
to
their
component
and
not
have
to
go
through
the
cubit.
I
guess,
and
so
just
trying
to
understand.
G
When
I
look
roughly
over
the
cap
you
have
here,
it
looks
like
you
have
new
metrics
you're
wanting
to
emit
and
is
the
question
more
aligned
of.
Is
that
a
good
or
a
bad
thing?
To
do?
I
I
mean
I,
I
have
no
objection
to
that.
F
Yeah,
so
it's
yeah,
so
what
do
we
have
there?
The
one
one
house
right
now:
it's
not
really
matrix
because
we
well,
I
guess
it
can
be,
but
it's
not
implemented
that
way.
So
it's
right
now,
it's
just
like
events
reported
on
par,
so
it's
not
really
integrated
with
the
metrics
support.
C
One
thing
I'd
suggest
because
I
have
not
seen
this
cap
at
all-
is
it
probably
would
be
worthwhile
pinging
sig
instrumentation
for
a
review?
I
don't
think
they're
listed.
F
I
think
we
reviewed
the
original
one,
the
ava
version.
I
believe
there
was
someone.
C
Yeah,
the
other
thing
that
I
would
say
is
adding
metrics,
because
all
metrics
start
as
alpha
doesn't
require
a
cap,
so
you
can
just
go
ahead
because
they
can
be
turned
off
now
so
or
at
least
they
can
be
mostly
turned
off.
Now
I
think
harder.
Turning
off
is
coming
in
this
release,
so
I
would
totally
suggest
that
you
sync
up
get
that
feedback,
but
you
know
go
ahead
and
run
with
that
stuff.
C
It's
something
that
you
can
sort
of
experiment
with,
because
there's
not
necessarily
great
guarantees.
So
that's
why
we
put
in
the
likes
of
runtime
back
out.
F
Okay
yeah.
Another
thing
is:
we
were
also
thinking
in
the
future.
Some
of
this
information,
maybe
can
be
used
for
like
for
us
to
do
some
make
some
actions
like.
If
you
know
something
happened,
maybe
controller
will
actually
do
something
with
those
pvcs,
so
that
probably
we
can't
really
even
know
metrics,
I'm
thinking.
So
that's
one
reason
that
we
may
may
not
go
with
the
matrix
route.
F
G
Yeah,
so
I
guess
there's
no
issue
in
other
components,
reporting
related
events
around
a
resource.
You
know
the
cubelet
can
talk
about
pods
in
an
event
resource
and
the
scheduler
can
and
that
that's
all
well
and
good.
I
I
do
think,
there's
an
issue
if
you're
trying
to
build
systems
with
some
guarantee
that
are
looking
to
respond
to
events
as
like
their
messaging
protocols,.
F
Yeah
because
we
couldn't
decide
what
to
do
with
those
events,
so
that's
why
right
now,
they're
just
even
they're,
not
the
first
class
field.
This
I
mean
with
those
information.
If
we
know
exactly
what
to
do
with
them,
then
you
know
we
can
do
those
in
pvc
status
or
something
so
that's
still
not
decided.
So
at
least
those
are
not
in
the
current
cap.
A
Something
this
is
why,
earlier
that
visuals
proposal,
how
to
managing
the
know,
the
component
and
the
plus
know
the
problem
detector.
So
actually
that's
can
help
you
to
figure
out
how
to
take
action.
So
that's
the
transcendent
oblong
state
right,
so
there
also
have
a
condition
like
permanent
of
the
problem
and
so
how
you
detect
those
kind
of
things
and
pop
up
to
the
upstream
components
and
then
take
action.
So
so
this
is
how
we
are
trying
to
deal
with
like
the
network
issue
in
our
production.
A
So
then
we
detect
of
the
transit
issue
and
also
convert
the
transient
issue
into
the
permanent
issue.
If
it's
persistent,
there
was
for
some
duration,
so
so
the
event
is
not
reliable
and
we
are
not
the
we
basically
from
day
one.
We
introduce
event
that's
most
of
the
debugging.
It
is
not
a
for
system
to
rely
on,
it
is
take
discovery
and
the
issue
or
recovery
issue.
So
that's
not
the
kind
we're
using
just
to
share
with
you
here.
Yeah.
A
Okay,
let's
move
to
next
topic
peter
and
robert:
do
you
want
to
talk
about
the
acid
rider
stats
thanks.
E
Hey
thanks
don
first
time,
caller
long
time
listener.
I,
my
name's
peter.
I
work
for
red
hat,
mostly
working
on
cryo,
so
for
a
bit
of
background
cryo
for
the
past
well,
forever
has
been
using
c
advisor
stats
in
the
q
clip
because
we
found
some
initial
regressions
with
performance
when
we
switched
to
cri
steps
so
we're,
finally,
in
the
works
of
actually
making
the
full
switch
over
and
finding
ways
to
make
it
more
performant.
E
So
we
I
started
off
by
making
some
changes
in
cryo
that
I
thought
may
help
a
little
bit,
but
it
didn't
do
quite
as
much
as
I
wanted
and
I'll
have
robert
describe
it
a
little
bit
some
of
the
different.
But
basically
basically,
we
tested
like
four
different
versions:
one
with
c
advisor
stats
and
cryo,
one
with
cri
stats
with
like
an
improved
cryo
implementation.
E
And
then
then
I
found
some
indication
that
there
was
still
work
being
done
on
the
c
advisor
side,
even
though
we
were
using
sarah
stats.
So
I
have
a
a
whip,
pr
that
I
put
up
on
the
dock
and
testing
with
that
actually
was
much
more
promising.
So
robert.
If
you
want
to
talk
about
the
difference.
H
H
I'm
just
gonna
go
into
a
little
background.
What's
motivating
this
from
our
end,
I'm
robert
crowets
I
used
to
actually
be
working
on.
H
I
used
I
used
to
actually
be
working
on
the
node
team
here,
but
I
switched
over
to
the
performance
team
last
year.
One
of
the
projects
I'm
working
on
involves
using
a
an
alternative,
runtime.
H
It's
not
appropriate
to
port,
it's
not
appropriate
to
port
c
advisor
to
know
how
to
talk
to
the
other
run
time.
So
that's
that's
sort
of
where
I
come
in,
in
addition
to
being
the
performance,
the
performance
team,
I'm
going
to
try
to
share
I'd
like
to
be
able
to
share
a
window
here.
Please.
H
A
H
Okay,
this-
this
is
a
this-
is
a
cpu
utilization
plot
here
by
process
it's
using
it's
using
a
tool
called
p
bench.
H
Suffice
it
to
say
that
I've
instrumented
the
worker
node
on
which,
on
which
my
test
is
running,
my
test
involves
creating
64
pods
each
with.
H
H
So
this
graph
here
is
the
base
case
using
c
advisor
stats
and
unp
patched
cryo,
the
red.
The
red
here
is
the
cpu
utilization
of
the
cubelet,
as
we
can
see
it's
averaging
well
about
19
percent
throughout
the
first
part
of
this
on.
The
left
is
when
everything
was
being
created
and
then
everything
just
idled
for
20
minutes
in
steady
state.
H
Let's
see
cryo
here
isn't
actually
cr
yeah
cryo
here
is
less
than
one
percent
cpu
utilization.
H
So
no
questions
on
that.
One
we'll
move
on
to
the
second
one,
which
was
with
cryo
patched
to
cash.
The
stats
result,
as
we
can
see
here,
the
the
cuboid
actually
consumed
rather
more
cpu.
It
consumed
about
25
percent
of
one
core
here.
H
I
E
Yeah,
so
I
mean
here
basically
and
robert
will
show
the
patch
cube
in
a
little
bit,
but
here
using
cri
stats,
like
c
advisor
is
doing
some
work
that
also
cryo
is
doing,
and
the
cubelet
is
basically
asking
crowd
to
do
that.
Work
like
to
give
the
results
and
then
also
asking
the
advisor
to
get
the
results
and
resolving
between
the
two.
Basically
just
choosing
c
advisor.
H
Is
I
I
neglected
to
mention
initially
looking
at
the
totals
average
cpu
consumption
was
perhaps
23
or
thereabouts
in
the
default
case.
In
this
case,
it
was
using
probably
close
to
about
33ish
percent,
so
somewhat
more
cpu
utilization
coming
out
of
that.
H
The
next
graph
here
is
cryo
with
patches
and
cubelet
with
patches
do
not
call
into
the
sea
advisor.
In
this
case
we're
seeing
cubelet
utilization
fall
to
about
15
percent.
Cryo
is
still
somewhere
in
the
range
of
three
percent,
but
the
total
average
cpu
consumption
now
is
only
about
20
percent
of
one
core.
So
this
is.
This
is
a
substantial
improvement
and
finally,
the
last
one
here
I'm
going
to
have
to
ask
peter
to
explain
it.
E
Yeah,
so
this
one
is
without
the
cryofixes
and
the
cryofixes
are
basically
just
emulating
what
the
advisor
does
with
caching.
The
disk
stats
results.
So
I
ran
it
with
this
vanilla
cryo,
where
it's
actually
blocking
the
entire
file
system
to
calculate
the
for
every
cri
staff
call,
but
I
wanted
to
show
kind
of
the
difference
between
and
but
this
has
the
keyboard
patches
that
drop
the
c
advisor
calls.
I
mean
I
wanted
to
show
like
in
isolation.
What
dropping
those
those
calls
does.
H
Okay,
so
here
we're
seeing
cubelets
taking
about
20
cryo's,
taking
in
this
case
about
two
and
a
half
percent
and
we're
total
total
we're.
Looking
at
maybe
24,
I'm
going
to
compare
this
to
the
original,
at
least
when
my
browser
is
willing
to
switch.
Where
again
we
see
it
was
in
the
same
range
20,
something
percent,
so
that
those.
C
E
Yeah
thanks
robert,
so
basically
the
point
of
bringing
this
here
is
that
there
I
I
put
up
a
pr
and
so
there's
some
duplicated
work
in
the
cri
stats
path
that
I
think
should
be
optimized
out,
and
I
wanted
to
come
here
to
see
why
we
by
default,
call
in
to
see
advisor
when
we
should
be
getting
the
one,
the
one
stat
that
I
don't
know
if
we
can
actually
get
from
the
cri
stats
call.
I
haven't
looked
into
it
hard
very
hard,
but
are
the
process
stats?
E
That's
one
thing
that
I'm
not
totally
sure
if
we
get
the
process
number
from
the
crs
class,
but
everything
else
seems
like
it's
duplicated
with
the
cri
stats,
but
we
just
default
to
using
the
c
advisor
version
if
it
exists,
and
I'm
I'm
wondering
if
there's
a
different
approach
like
dropping
those
calls
or
like
maybe
re,
changing
the
priority
so
that
if,
if
the
values
exist
in
the
tri,
we
use
those
instead.
So
I
just
wanted
to
gapping
that
up.
H
So
let
me
just
comment
on
that.
The
use
case-
the
use
case
we
envision
here-
is
for
virtual
virtual
machines
running
the
content.
The
containers
running
the
containers
inside
well
running
the
container
inside
a
virtual
machine
is
managed
as
a
pod,
so
c
advice
c
advisor
on
the
host
would
not
be
able
to
extract
meaningful
data.
All
it
would
be
able
to
extract
data
for
would
be
the
vm.
A
So
there's
the
is
the
frontal
state
weather
it's
just
legacy.
Leader
say:
the
weather
is
like
initially
cerebral
start
before
kubernetes
this
one
this
year
and
even
before
we
have
this.
So
then
we
have
the
sid
weather
came
from
the
same
team
and
from
the
book
team,
google
team,
and
so
so
initially,
when
we
start,
the
monitoring
is
just
like
the
whatever
monitoring
available
and
we're
just
using
so
we're
just
using
cellular
monitoring.
A
But
you
can
see
that
when
we
first
start
the
container
runtime
interface,
we
talked
with
docker
team
and
company
community,
so
we
basically
want
to
using
cis
that's.
This
is
why,
from
day
one
we
do
but
over
time.
So
that's
why
we
have
like
the
stats,
related
api
and
so
but
over
time.
There's
the
first
thing
initially,
I
think
the
even
at
the
ci
alpha
release
and
the
monitoring.
There
are
many
debate,
so
so
it's
not
finalized.
So
so
we
have
to
like
the
alpha
and
the
law
out.
A
That's
more
finance
and
also
second,
one
is
continuity.
Implementation,
even
the
api
is
finalized.
Implementation
is
not
ready,
so
initial
name,
so
we
pound
on
that.
One.
Then
later
we
have
the
cryo,
so
when
we
want
to
think
about,
we
can
switch.
We
had
just
performance
concern,
so
that's
why
we
are
kind
of
hold,
because
we
want
to
both
continue
d
and
this
cryo
and
get
graduate
from
the
incubator.
So
we
delay
that
one.
So
after
that
I
keep
heard
like
the
class
performance.
A
Have
some
performance
concerns?
So
that's?
Why?
And
just
recently
I
heard
that
that
problem
is
going
to
be
fixed,
so
big
is
just
legacy
really.
We
do
want
to
switch,
but
I
I
I
knew
continuity
will
be
behind,
like
the
using
continuity
will
be
similar
data
here.
But
hopefully
someone
can
perform
like
the
similar
performance
analysis
and
generate
the
similar
data
here.
So
now
we
could
basically
switch
because
we
do
have
like
the
signal
that
basically
is
kinda
like
both
of
the
two
container
runtime.
Why?
So?
A
We
made
a
lot
of
great
decisions
for
the
both,
and
I
want
to
make
sure
we
also
have
the
container
d
data
here
and
then
we
could
say:
okay,
let's
switch
to
the
cryo
stats.
E
I
was
just
checking
that
container
d
uses
cri
staff
instead
of
like
the
c
advisor
staff.
B
Oh
yeah,
this
one,
so
we
don't
have
any
specific
plans
to
transition
there
like.
I
think,
david
working
on
that
and
we
don't
have
any
timelines
for
that.
I
think
generally,
like
strategically.
We
may
want
to
go
there,
but
for
now
there
is
no
actual
plans.
J
Yeah
yeah,
sorry
just
to
add
so
so
like
for
container
d
it.
It
is
using
cri
stats
but
like
you've,
seen
in
the
kubelet
like
when
you,
when
you're
getting
the
cri
stats,
it's
still
fetching
out
the
the
c
divisor
stats,
so
yeah,
okay,
cool
the
difference,
though,
with
container
d
and
cryo
in
terms
of
stats
and
stuff
is.
J
I
did
notice
that
in
kublet
there
is
a
function
called
like
using
legacy
c
advisory
stats
and
that's
using
for
summary
api,
and
for
that
there's
actually
a
specific
check,
that's
hard
coded
in
kubelet,
and
it
basically
checks
if
you're
using
cryo
and
I'm
not
sure,
on
the
whole
background,
why
that
was
put
in
but
there's
a
basically.
If,
if
you're
using
cryo,
then
it
will
default
to
using
c
advisor
stats.
In
addition
for
image
sets
as
well,
yeah.
E
Yeah,
so
so
for
clarity,
the
on
these
tests.
We
did
remove
that
line
and
that
line
was
added
because
of
the
these
performance
issues
that
were
now
like
rehashing
and
trying
to
you
know
resolve.
So
we
we
added
that,
because
we
were
concerned
about
the
duplicated
work,
but
didn't
have
the
time
to
kind
of
fix
it,
but
now
we're
you
know
getting
back
to
it.
So
part
of
this
effort
would
also
be
removing
that
line,
and
you
know
from
some
point
once
we've
decided
it's
performance
enough.
E
Actually
using
you
know,
cri
stats,
the
same
as
container
d.
J
Got
I
got
it?
Oh
yeah,
that
makes
sense
yeah
a
couple
comments
I
wanted
to
make.
I
mean
first
of
all
really
awesome
thanks
for
doing
this
kind
of
performance
analysis
and
getting
all
this
data.
One
thing
that
might
be
additionally
helpful
is
we
can
provide
like
a
cpu
profile
of
kubelet
itself.
J
That
way
we
can
see
exactly
which,
which
part
of
kubelet
is
taking
up
this
extra
cpu
and
whether
that's
because,
for
example,
even
with
I
saw
the
changes
you
made,
I
believe
c
advisor
is
still
collecting
the
staff,
so
it'd
be
nice
to
understand
and
really
pinpoint
down.
Is
it
additional
like
when
you're
calling
out
to
get
the
metrics
like
that,
additional,
like
fetching
and
merging
of
the
c
advisor
and
cri
stats,
or
is
it
like
in
the
background
the
fact
that
the
advisor
is
still
clicking
the
stats?
E
I
was
also
a
little
bit
confused
by
that,
but
my
thought
that
might
my
guess
not
really
knowing
is
like
by
cuba
never
requesting
for
it.
It's
like.
I
don't
know
how
see
advisor
decides
to
start.
Looking
at
you
know
a
particular
container
or
directory,
but
so
like
maybe
by
the
cuba
not
asking
for
it.
The
advisor
just
doesn't
try,
but
I
I
don't
really
understand
the
interaction
right
now.
J
Yeah,
my
understanding
is
still
well
collected,
but
maybe
because
it
doesn't
call
out
there's
some
performance
head
yeah.
So
so
that's
why
the
benefit
like
a
cpu
profile,
kubelet
itself.
What
I
think
really
answered
that
question
we
could
see
exactly
which
which
functions
using
the
the
most
amount
of
cpu,
and
this
would
help
narrow
down
this
investigation
a
lot
I
think
so
that
might
help.
A
A
Oh
sorry,
I
just
wanted
one
comment
so
david:
can
you
can
you
do
the
similar
experimental
or
maybe
work
with
the
peter
and
robert
do
the
same
in
their
experimental?
So
we
can
see
that
the
contenders
staff,
so
then
we
can
make
the
decision
move
forward.
We
do
want
to
switch
to
using
the
csi
stats
for
a
while
right.
So
the
always
the
constant
it
is
just
uncertainty
about
the
performance
and
the
resource
utilization,
all
those
kind
of
things.
J
A
J
Yeah
definitely,
I
think
that
makes
sense
and
yeah
to
do
the
same
type
of
experiment,
container
design
and
make
sure
it
works
there
as
well.
J
E
You
so
the
one
other
quick
question
I
just
had
is
another
thing
that
we
should
investigate
is
on
both
of
the
cri
implementations
parts
of
the
cri
stats
that
we
may
not
be
filling
that
motivated
us
to
use
your
advisor
to
augment
those
stats.
So
something
that
I
will
personally
look
at.
I
mean
with
this
pr.
It
seems
like
nothing
breaks
horribly.
C
E
H
H
Again,
putting
on
my
putting
on
my
hat
for
the
other
project
here
with
sam
with
having
containers
running
in
vms
is
that
c
advisor
will
be
of
no
use
for
looking
inside
the
vm.
A
Yes
and
yeah
so
so,
and
also
this
effort
once
we
switch
to
the
csi
stats
and
this
effort
can
help
us
to
refactor
the
state
of
weather
and
make
sure
they
have
the
cal
have
like
the
library
like
the
core
matrix
or
know
the
metrics.
A
The
cellular
already
been
refracted
a
couple
times
to
satisfy
cognitive
needs
in
the
past,
but
it's
not
done
yet
in
the
cri
we
basically
design.
We
basically
say
we
want
to
refraction.
It
is
and
then
make
this
the
only
circuit
of
the
node
related
of
the
core
matrix
and
from
the
part
level
matrix
and
then
give
after
all,
the
container
related
metrics
to
the
csi.
So
then
we
can.
We
can
talk
about
more
how
to
refactor
that
one
and
then
how
to
use
it
to
link
back
to
the
kubernetes,
how
to
evolve.
A
J
Yeah,
definitely,
I
think,
yeah
that's
been
a
long,
long
effort
and
I
think
yeah,
starting
with
like
making
sure
we
can
fully
rely
on
cri
stats
on
all
the
end
points
that's
definitely
first
step
and
that
we
can
think
about
doing
that
for
sure.
A
And
in
that,
after
we
have
the
reflection,
robert,
then
those
libraries,
some
of
the
library
and
then
you
could
using
and
then
link
against
the
monitoring.
We
sing
that
via
so
then
you
have
like
the
purp
one
part
or
one
couple
container
run
of
them,
some
like
the
vm
and
then
that
could
be
linked,
because
this
is
original
idea.
What
I
proposed
with
the
the
third
continental
other
side-
I
forgot
the
name
and
but
yeah
level
finished
that
one
also
level
graduated
from
the
incubate
incubator.
A
A
C
I
can
jump
in
a
little
bit
on
the
state
of
issues,
which
is
that
we
do
not
have
them
in
a
very
well
triage
state
and
for
the
most
part
I
think
people
are
focusing
on
the
backlog
of
issues
sort
of
based
on.
You
know
the
the
kinds
of
bugs
that
they're
seeing
in
production
and
using
that
to
prioritize.
We
have,
I
think,
over
500
open
bugs
right
now
in
signod
or
something
around
there,
and
so
we
need
to
go
through
and
triage
them.
C
It's
it's
sort
of
on
my
long-term
road
map.
Right
now,
I'm
trying
to
focus
on
getting
the
pr's
under
control
and
then,
as
soon
as
we
kind
of
got
that
in
a
good,
steady
state,
then
I'm
going
to
start
tackling
the
issues,
but
I
wanted
to
say
just
thanks
so
much
for
staying
up
so
late.
I
think
it's
like
not
a
great
time
for
you
right
now
and
it's
nice
to
meet
you
and
you're
great.
K
A
To
meet
you,
and
also
I
just
mentioned
to
you
at
the
beginning,
for
the
first
topic
I
mentioned
that
you
try
to
help
fix
that
problem,
and
it's
just
maybe,
and
maybe
it
is
because
too
late
for
the
release.
So
we
decided
not
to
take
your
your
fix
and
instead
we
reworked
original
pr,
and
now
we
look
into
how
to
fix
that
problem.
A
So
just
thank
you
for
your
attempt,
fixing
the
issue
and
and
also
there's
the
subnet
shared
the
dock
and
in
the
signal
when
it
is
actually
shared
created
by
the
menu
and
about
like
the
kind
like
the
plan
for
q1
right.
So
there's
something
in
that
list
and
also
don't
have
owner
and
also
some
require
of
the
more
reviewer.
A
Maybe
you
can
start
from
there
and
the
last
last
quarter
actually
also
directly
share
the
one
dog
you
can
find
that
dog
and
for
the
clean
state
like
we
have
many
many
feature
still
is
captured
as
the
alpha
or
beta.
We
want
to
promote
some
from
there
from
the
alpha
to
beta
beta
to
ga,
so
the
sum
of
those
kind
of
things
and
also
no
owner,
and
even
there
have
the
owner
some
of
they
don't
have
the
reviewer.
A
So
you
also
can
start
from
there
and
take
a
look
and
maybe
understand
something
like
the
pr
something
like
a
feature
why
we
decided
not
to
promote
or
how
we
promote.
So
you
can
start
from
those
and
maybe
that's
the
easiest
way
for
people
to
start
looking
into
some
work.
K
A
L
Yeah,
thank
you
don't
like,
so
we
we
discussed
about
this
previously
as
well
like
the
adding
a
flag
in
cubelet
to
disable
the
p
prof
and
that's
the
issue
that
we
created.
But
I
was
wondering,
like
the
issue
talks
about
adding
one
flag
to
control
both
the
endpoints,
the
p
prop,
as
well
as
the
flags
I
was
wondering
like
if
we
should
probably
add
additional
or
another
configuration
for
the
flags
separately
and
don't
combine
it
with
the
profiling
flag.
A
Suggestions,
I
need
to
refresh
my
memory
about
this
one.
L
Yeah,
so
this
was
basically
like
for
aws,
specifically
like
we
have
a
fargate
instance
where
we
manage
the
cubelet
tasks
on
the
on
our
images,
and
we
don't
want
the
our
customers
to
go
directly
invoke
the
peep
prop,
as
well
as
the
set
the
debug
flags.
L
So
that's.
Why,
like
we
kind
of
created
this
issue,
a
long
time
back
and
then
and
then
what
we
you
know
from
our
initial
meeting,
what
we
decided
was
like
we
initially
kept
these
ones
open
for
debugging
purposes
and
it's
okay
to
add
a
flag
to
disable
these
endpoints
and
one
of
our
community
member.
They
were
also
working
on
the
pr,
but
they
were.
They
had
one
flag
to
control
both
the
end
points.
So
I
wasn't
sure
if
that's
the
right
move
so
wanted
to
see.
If
we
should
have
separate
flags.
J
L
A
J
Today,
just
check
there's
a
enabled
debug
handle
enable
debugging
handlers,
but
it's
a
google
flag
though,
but
right.
L
Yeah
so
enable
debugging
handler,
like
that
kind
of
takes
care
of
like
most
of
the
endpoints
that
cubelet
serves
and
this
specific
issue
just
for
turning
off
the
p
prof
and
then
the
the
flags
in
point.
A
Sure,
anyway,
even
we
follow
up,
we,
we
will
share
back
to
the
signals
so
yeah
yeah.
Let's
need
to
follow
up
after
this
one.
Then
we
share
back
to
the
you
know
after
that,
yeah.
A
Thank
you.
Thank
you.
That's
all
for
today,
any
other
topic
people
want
to
raise,
and
then
otherwise
we
just
call
off
for
today
and
we're
nice
to
see
everyone.
A
Here:
okay,
thanks
everyone
and
talk
to
you
next
week,
bye.