►
From YouTube: Kubernetes SIG Node 20230530
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20230530-170349_Recording_1920x1080.mp4
A
Hi,
everyone
welcome
to
May
30th
2023
signal
weekly
meeting.
We
have
a
few
items
on
the
agenda,
so
we
can
get
started.
A
A
So
all
right,
I
think
what
we
can
do
is
maybe
next
week
we
can
make
a
pass
and
create
a
final
list,
and
this
week
we
can
try
and
go
through
and
review
and
approve
what
we
can
sure
I
mean
folks
that
want
to
drive
something
and
we
want
to
make
sure
something
is
in
128,
make
sure
that
you
have
approvers
and
reviewers
who
have
the
time
to
help
you
with
your
kit.
B
We
need
to
make
sure
that
if
you
own
make
sure
that.
A
All
right,
I
think
Sergey.
Your
video
is
Frozen.
D
A
Yep,
okay,
okay,
so
we
can
move
on
to
the
next
item
on
the
agenda.
So
harshal
I
know
you
want
to
talk
about
swap.
Do
you
want
to
quickly
give
an
update.
E
Yeah
sure
I'll
just
give
a
link
here
in
the
chat.
So
this
is
the
enhancement
we
are
trying
to
move
forward
the
Swap,
and
we
would
like
to
take
a
cautious
approach
here,
considering
the
impact
of
swap
on
the
cluster,
especially
on
the
Node,
so
we
are
proposing
to
enable
swap
or
or
why
only
for
the
burstable
parts.
E
So
the
idea
behind
is
it
is
that
if
you
allow
user
to
set
the
sharp
values
like
you
do
in
case
of
memory
in
CPU,
considering
how
much
little
we
know
about
test
drive,
probably
we
may
not
end
up
in
a
right
situation.
So
our
idea
is
to
enable
only
to
the
burst
tables
parts
and
calculate
the
values
automatically,
and
this
enhancement
actually
describes
in
detail
how
you
can
arrive
at
those
values
and
once
we
get
enough
confidence
in
future.
E
B
Yeah
I
think
one
more
limitation
we
discussed
before
is
whether
we
can
limit
everything
to
C
group
V2
only
because
in
C
group
52
we
have
way
more
control
and
security
security
sector.
It
will
be
much
better
to
do
it
in
controlled
fashion,
rather
than
just
enabling
whoever
gets
it
will
get
potentially
gets
their
secrets
exposed
and
stuff
like
that.
E
A
E
Yeah
yeah,
but
overall,
if
I'm
sensing
it
right,
we
are
in
support
of
this
to
move
ahead
and
except
just
let
me
know,
on
the
step
whether
you
agree
with
this
approach
of
cautiously
moving
forward.
There's
a
we
are
Splinter
beta
into
two
phases,
and
so.
C
E
A
All
right,
thank
you.
We
can
move
on
to
the
next
topic.
Peter
and
Marcus
discover
cubelet
C
group
driver
from
CRI,
so
I
just.
F
G
Yeah
I
saw
your
your
note,
so
there
is
now
announcing
question
that
you
know
I
could
make
sense
to
bring
to
the
larger
team
but
yeah.
We
just
wanted
to
illuminate
that.
We
have
the
cap
open
and
you
know
want
people
to
take
a
look
at
it.
Independent
of
the
open
question.
We
decided
to
break
up
the
runtime
status
field
into
Linux
and
windows,
specific
Fields,
because
the
C
group
driver
isn't
really
that
relevant
to
Windows,
but
other
than
that.
This
changes,
as
we
discussed
a
couple
months
ago.
G
You
know
way
for
the
runtime
to
report
some
information
up
to
the
cubelet,
especially
considering,
like
you
know
at
this
point,
the
the
only
c-group
driver
that
secret
V2
support
is
system.
D
and
you
know
I,
don't
know
of
really
any
plans
to
add
C
group
confess
like
do
we
even
need
to
do
this,
or
do
we
just
you
know,
begin
the
process
of
deprecating
secret
profess
I.
G
Think
that's
like
a
slightly
different
conversation,
but
in
terms
of
the
question
about
like
whether
there's
a
you
know,
other
things
that
we
could
use
this
for
I
think
we
had
kind
of
started
to
talk
about.
You
know
part
of
the
initial
conversation
that
brought
this
up
was
beginning
to
think
about.
Like
you
know
the
who,
the
owner
of
some
of
these
you
know,
runtime
configurations
would
be
like
right
now.
G
There's
some
I
mean
with
the
senior
driver
specifically
there's
like
a
split
brain
thing,
going
on
and
but
I
I
can't
think
of
any
specific
fields
that
we
currently
have.
That,
like
should
be
moved
over,
because
this
one's
kind
of
the
only
one
where
the
Cuban
and
the
CRI
need
to
be
in
sync
but
I
know
that
there
was
thought
about.
Maybe
leveraging
this
for
qos
classes
to
have
CRI
be
able
to
illuminate
like
broadcast,
which
do
us
classes
exist
so
that
the
qubit
can
inform
the
scheduler
about
it.
G
A
Should
this
be
a
separate
call
like?
Are
we
overloading
the
runtime
status?
So
that's
one
thing:
should
it
be
like
get
runtime
features
as
a
separate
call?
That's
one
thing,
and
one
more
thing,
I
want
to
add,
is
like
when
we
were
discussing
username
spaces
with
auth,
the
question
came
up.
Is
there
a
security
issue
like
the
runtimes?
Are
support
are
supposed
to
fail
if
they
are
not
able
to
create
a
username
space
right
now?
Do
we
think
we
need
something
like
this?
A
Where,
when
the
cubelet
is
calling
into
the
run
time,
it
gets
a
list
of
supported,
features
or
say
a
runtime
is
too
old,
it
doesn't
even
know
about
username
spaces,
and
it's
not
failing
and
it's
it
gets
some
pod
configuration.
It
says:
I
I
started
it
correctly.
So
do
we
want
this
additional
runtime
check
to
see?
Cubelet
gets
a
list
of
features,
and
if
username
spaces
is
not
in
the
list,
it
will
not
even
try
to
start
a
pod
or
fail
in
I'm,
not
sure
if
that
makes
sense
or
not.
G
Yeah
I
think
I
think
having
a
separate
call
is
fine
with
me:
I
think
we
just
stuck
it
in
here
because
it's
you
know
this
was
a
place
where
the
keyboard
was
asking
the
CRI,
like
you
know,
hey
what
do
you
have
for
me
in
terms
of
writing
the
status
but
I
I
think
you
know
a
separate
one
makes
sense,
I
the
but
figuring
out
what
the
kind
of
like
communication
method
would
be
like
what
the
the
schema
would
be
might
be
a
little
bit
tough,
because
the
username
space
is
more
of
like
do
you
support
username
space,
whereas
the
C
group
driver
is
like
which
C
group
driver?
G
Are
you
choosing,
but
I
I
think
that
generally
having
a
way
for
their
Cuba
to
ask
the
CRI
for
information
and
having
that
be
all
wrapped
up
into
one
I,
think
that
makes
sense
yeah.
I
I
think
I
would
avoid
I.
Don't
disagree
with
what
you're
saying
Ronald
I
just
think
if
we
keep
this
kept
just
focused
on
the
minimizing
misconfiguration
issues
between
runtime
and
cubelet.
I.
Think
that
just
language
in
this
cup
saying,
if
the
if
the
runtime
reports,
something
the
keyboard,
will
then
steer
to
that
rather
than
it's
config
I,
think
that
that's
fine
I
think
to
do
the
broader
issues.
There's
still
things.
I
Obviously,
that
Cuba
is
setting
up
that
runtimes
are
not
setting
up
and
and
whether
that's
rich
with
c
groups
are
actually
present
on
the
system.
All
that
gets
into
a
bigger
can
of
worms
like
someone
was
asking
this
morning
about
pids
and
their
enforcement,
and
that's
still
just
the
keyboard
oriented
feature
so.
H
Anyway,
personally,.
I
I
would
just
keep
this
at
the
at
the
discoverability
thing
and
not
necessarily
feel
like.
We
need
to
solve
all
the
problems
right
now.
A
D
I
So
don
I
think
in
this
case,
though,
the
question
is:
what
is
the
preferred
C
group
management
model
of
the
host
node
wide,
whereas
the
runtime
classes,
pod,
specific
and
then
I,
don't
think
we
can
actually
have
different
managers
of
the
unified
C
group
hierarchy.
So
this
is
basically
just
saying
who
is
that
manager
correct
me
if
I'm
wrong,
Peter
right.
G
Yeah
yeah
we're
not
we're
not
trying
to
accept,
like
continuity,
has
the
option
to
have
speaker
drivers
per
runtime
class,
but
we
couldn't
really
think
of
a
reason
to
integrate
that
and
so
we're
not
including
that
so
the
the
I
think
the
runtime
classes
were
brought
up,
because
that
is
an
option
for
the
CRI
to
tell
the
qubit.
This
is
what
I'm
you
know.
This
is
what
I
have
for
you
like.
G
This
is
what
I'm
supporting
go
ahead,
and
you
know
cue
like
instead
of
having
you
know,
to
have
to
have
the
CRI
configuration
saying.
Here's
all
the
runtime
handlers
I
have
and
then
have
the
cubelet.
You
know
have
those
be
API
objects
created,
Cube,
API
and
have
those
need
to
agree
that
would
come
from
the
bottom
up.
G
I
think
was
the
motivation
of
mentioning
that
similar
to
the
qos
classes,
where
the
the
information
is
going
to
Bubble
Up
from
the
CRI
CRI
is
the
one
that
owns
it
and
Bubbles
it
up
to
the
qubit,
which
will
then
tell
the
schedule
and
everyone
like
here's.
The
state
of
the
this
node.
G
G
I
also
think
that
it
is
relevant
to
discuss,
like
you
know
the
general
pattern
of
CRI
to
pass
up
information
to
Cuba,
to
make
sure
that
we
have
the
API
extensible
in
such
a
way
where
we
could
support
these
other
options,
because
I
agree
that,
as
we
go
forward,
I
think
that
there
will
be
other
situations
where
we
want
to
extend
this
to
be
like
CRI,
we'll
be
telling
the
qubit
much
more
in
the
future.
G
So
I
think
a
separate
CRI
call
makes
sense
to
do
this
so
that
the
Cuba
can
ask
it
at
different
times,
and
it
asks
the
runtime
status,
and
that
will
give
us
more
flexibility,
but
that,
but
for
this
cap
we're
just
going
to
add
this
stuff,
and
then
we
can
think
about
adding
other
stuff
later
or
in
a
parallel
cup.
J
H
A
All
right
so
I
think
we
have
a
plan
thanks
for
the
discussion.
Folks,
move
on
to
the
next
item.
Memo
will
change
power
phase
when
containers
exit
with
zero.
This
looks
like
a
new
issue
that
was
just
opened
yesterday.
Is
anyone
on
the
call
that
can
speak
to
this
one.
K
K
K
So
the
pr
was
to
ensure
that
the
terminal
phase
is
assigned
to
All
Parts,
but
the
focus
was
on
deleted
Parts,
but
as
a
side
effect
in
a
couple
of
scenarios,
the
device
is
now
different
than
it
was
when
the
container
all
containers
exit
with
exit
code
0
and
the
restart
policy
is
always
so
in
a
couple
of
scenarios
that
I
listed
in
the
table
it
was
filed,
but
now
it
succeeded
so
first
of
all,
I
would
like
to
discuss
like.
Is
it
like
a
bug?
K
K
But
if
it's
not
a
bug,
then
we
should
probably
clean
up
the
code
now
because
this
components
they
like
in
the
code,
they
said
the
face
to
failed,
but
it's
ignored
because
the
face
is
now
computed,
basically
from
the
container
exit
clouds.
So
it's
it's
misleading
any
any
views
on
that.
A
I,
don't
have
anything
of
the
top.
Has
anyone
else
looked
at
it
closer
to
be
able
to
give
guidance
or
we
want
to
take
it
asynchronous
and
focus
can
review
and
chime
in
there.
D
K
D
So
basically
right
now
it
is
if
it's
not
a
binary,
it
is
still
for
terminated,
triggered
by
eviction
or
preemption
whatever.
So
the
face
is
succeed
and
then
associated
with
some
reason,
which
is
good
thing,
but
the
problem
is,
we
just
don't
know
existing
controller,
how
they
are
people
and
also
customer
Implement,
their
controller,
how
they
are
going
to
consume
those
kind
of
things,
that's
kind
of
the
semantic
change
for
user
I'm.
Not
this
is
wrong
or
right.
Let's
just
say
this
is
this:
is
the
how
the
ecosystem
consume.
K
K
K
K
So
this
might
be
something
to
to
look
at,
although
it's
like
rare
for
users
to
use
jobs
that
would,
when
sick
time
that
they
would
exit
with
zero
exit
codes
like
this,
this
is
sort
of
asking
for
tablets
anyway,
but
yeah.
It
might
be
an
issue
I'm
not.
C
K
If
this
answer
your
question,
so
if
yeah
the
the
the
part,
but
yes
so
when
you
say
addicted,
it's
addiction
like
due
to
node
pressure,
for
example.
Yes,
not.
K
No
way,
job
controller
will
consider
this
as
succeeded,
so
the
job
will
be
succeeded.
So
I
went
what
about
the
different
scenario.
Yes,
in
this
case,
the
controller
will
just
consider
the
spot
as
files
and
the
entire
job
will
succeed.
K
Which
is
probably
okay,
I
mean
if
you
handle
sick
time
for
the
you
exit
with
zero
in
the
fall,
you
don't
normally,
you
did
like
checkpointing
or
something,
but
like
no
idea,
if
you
end
with
0
106
term
and
exit
with
zero,
it
means
that
you,
you
control
the
file
that
sort
of
other
exit
of
the
top.
D
K
Yeah
so.com
job
controller
I
think
is
fine
but
yeah
in
the
wild
introduction.
Customers
may
have
different
yeah.
D
Definitely
there's
some
some
customer
controller,
but
I
think
that
at
least
the
out
of
box
kubernetes
controller.
If
we
could
scan
through
all
those
kind
of
controllers
and
figure
out
how
the
handle
they
will
be
good
good
right.
So
at
least
that's
the
minimum
requirement.
Otherwise
this
is
character.
K
I
would
yet
try
to
consult
this
with
Clayton
who
participated
in
the
in
the
in
the
pr
and
and
yeah
I
also
asked
him
and
you
solve
them.
I
think
we
just
clean
up
the
the
code
right
for
the
different
sub
components
of
cubelet
to
do
not
set
the
phase
to
fail,
because
that's
misleading
right.
K
M
Okay,
can
everyone
see
my
screen
all
right,
cool?
Okay,
let
me
just
give
a
quick
intro.
My
name
is
Andrew
stoichus
I
work,
mostly
in
Sig
network
in
the
network
policy,
API
subgroup,
but
more
recently,
as
part
of
my
company's
kind
of
Endeavors,
we've
been
working
well,
I
work
for
red
hat
in
the
office
of
emerging
Tech
and
we've
been
working
on
a
new
project
called
vpfd
I
poked
in
my
head
last
week,
just
to
see
if
y'all
would
be
interested
and
I
didn't
get
any.
M
No
so
I'm
here
to
do
a
quick
presentation.
I'm
gonna
try
to
keep
it
pretty
short
I'm,
not
going
to
do
a
live
demo.
Just
in
concerns
with
time,
because
I
know
we
have
a
lot
of
the
agenda
but
still
feel
free
to
raise
your
hand
during
the
presentation
and
I
can
stop
and
try
to
answer
any
questions.
M
Okay,
so
I'm
not
going
to
dive
deep
into
what
evpf
as
a
technology
is
I.
Think
many
of
us
have
heard
that
buzzword
because
it's
been
used
as
a
marketing
marketing
technique
by
many
companies.
People
are
really
excited
about
it
and
it's
kind
of
this
new
general,
perfect
purpose
framework
that
allows
users
to
I
won't
say
easily,
but
allows
users
to
run
sandbox
sandbox
programs
in
the
kernel
without
having
to
change
parental
code,
it
can
be
used
for
a
ton
of
different
purposes.
M
I
come
from
a
networking
background,
that's
kind
of
where
it
started,
but
now
it's
been
expanded
to
include
use
cases
for
monitoring,
tracing
and
security,
and
so
because
of
this,
we
aren't
just
presenting
our
work
to
like
Sig,
Network
or
signode,
or
security.
We're
trying
to
go
for
all
three,
because
we
think
BPF
and
kubernetes
kind
of
has
a
lot
of
overlapping
uses
and
it
can
affect
the
security
of
a
cluster
at
large.
So
all
three
sigs
are
kind
of
important,
so
udpf
and
kubernetes
itself
has
been
like
majorly
On,
The
Rise.
M
Obviously,
if
you've
been
to
a
past
couple
coupons,
it's
come
up
more
and
more
psyllium,
being
one
of
the
big
drivers
of
that
there's
a
bunch
of
other
examples
as
well.
M
Apart
from
the
psyllium
and
Calico
cnis,
you
have
pixie,
which
is
around
observability,
qbarn
or
armor,
which
is
runtime
enforcement
Blix,
which
is
a
from
Kong
and
being
donated
to
kubernetes
six,
which
is
going
to
be
a
Gateway
API
L4
conformance
implementation,
another
project
I'm
working
on
along
with
Shane
from
c
network,
and
then
we
also
have
another
example
of
net
observe,
which
is
a
open
source
observability
operator,
they
all
kind
of
rely
on
evpf
as
their
underlying
technology.
M
So,
although
you
know
the
proliference
of
of
BPF
and
kubernetes
is
great,
it
leads
to
a
lot
of
kind
of
interesting
problems.
One
of
the
main
ones
is
that
BPF
today,
like
always
requires
privileged
pods,
specifically
and,
and
it
needs
cap
BPF
to
load
and
manipulate
BPF
programs
and
BPF
Maps,
that's
at
the
very
minimum.
Most
often
in
real
deployments.
You
need
even
more
as
we
can
see.
I
did
a
sample
of
the
net
observe
operator
and
we
need
all
these
caps
lifted
listed
below.
M
Another
big
problem
with
running
BPF
and
kubernetes
is
there's.
No
cooperation
means
right
now.
So
if
one
operator
tries
to
load
a
program
on
the
same
hook
as
another
operator,
things
can
get
messy
things
can
get
undeterministic.
There
was
actually
a
really
good
talk
at
the
last
kubecon,
which
I'll
need
to
link
here
from
data
dog
and
psyllium
on
an
explicit
case
of
this
happening,
and
it
was
almost
impossible
for
them
to
figure
it
out,
but
a
whole
kubecon
talk
dedicated
to
figuring
out
what
was
broken
and
what
was
going
on.
M
So
this
is
obviously
one
of
the
biggest
problems.
It's
also
really
hard
today
to
debug
these
problems
like
I
just
mentioned,
and
then
additionally
like
there's
a
ton
of
duplicated
code
code
and
functionality
across
applications
which
want
to
deploy
BPF
and
kubernetes.
So
these
are
kind
of
the
main
challenges
we
see
from
a
bpfd
community
and
it's
also
some
of
the
main
challenges
we've
set
out
to
solve
with
vpfd.
M
So,
let's
hop
into
what
it
actually
is.
Obviously
it's
an
open
source
project.
It
started
in
red,
Hat's,
emerging
Tech
networking
group
and
it
actually
started
in
the
red
hat
ET
GitHub
organization,
but
has
now
been
moved
to
an
unapinated,
bpfd
Dev
organization.
So
fully
open
source
project.
M
We
are
also
listed
on
the
utf.io
projects
page
and
what
it
essentially
is
is
a
system
Damien
for
managing
BPF
programs.
This
means
that
we're
managing
full
life
cycle,
BPF
programs,
loading
unloading
and
we're
providing
a
privileged
separation,
so
that
only
our
Damien
is
privileged,
while
our
users
don't
have
to
be,
which
is
kind
of
really
important
for
one
of
those
for
a
lot
of
security
use
cases
it
for
XDP
and
TC
programs,
which
are
both
Network
oriented
programs.
M
M
But
in
the
future,
this
functional
functionality
will
one
day
be
actually
built
into
the
kernel
and
when
that
happens,
we'll
kind
of
use
that
implementation
instead
and
we're
also
going
to
be
providing
a
lot
of
policy
security
and
visibility
Tools
around
it
around
loading
BPF,
so
we'll
give
fine
grained
control
over
who
what
users
can
load
BPF?
What
hook
points
they
can
load
to
and
will
also
give
a
lot
of
visibility
into
what
BPF
programs
are
loaded
onto
a
system
and
if
any,
are
malicious.
M
Based
on
our
analysis,
so
some
details,
bpfd,
is
developed
in
Rust.
It's
built
on
top
of
a
rust
BPF,
Library
called
Aya,
and
one
of
our
main
use
cases
is
is
focused
on
deploying
BPF
and
kubernetes,
which
is
why
I'm
here.
So
we
also
have
a
kubernetes
operator
which
is
all
written
and
go
so
we're
not
asking
anyone
to
learn
rust,
but
it's.
It
has
its
perks
for
being
a
system
Damian
language
and
today,
people
who
are
already
writing
their
evpf
enabled
applications
for
kubernetes
using
existing
libraries
such
as
psyllium
Aya,
lib
BPF
Etc.
M
So
next
I'm
going
to
show
two
slides
that
just
are
kind
of
images.
I,
really
love
images.
The
first
one
is
showing
like
what
BPF
deployment
and
kubernetes
specifically
looks
like
today.
As
I
said,
we
have
a
few
of
our
BPF
enabled
applications
on
the
bottom.
This
is
on
a
single
node
they're.
All
deploying
Damien
sets
in
order
to
load
their
BPF
applications.
These
Damian
sets
are
all
privileged.
M
They
all
require
cat
BPF
and
they
all
use
kind
of
their
own
stack,
whether
it's
live,
BPF
enabled
psyllium,
EPF,
enabled
or
I
enabled
to
load
and
manage
their
BPF
programs.
So,
as
you
can
see,
there's
a
lot
of
room
here
for
duplicate
functionality.
There's
not
really
any
segmentation
of
capabilities.
M
Every
application
today
has
to
do
this,
and
we
don't
think
this
is
the
way
things
should
be
done
in
the
future.
We're
hoping
for
it
to
look
something
more
like
this
right.
So
now,
you're
BPF
enabled
applications
fit
in
their
own
layer
and
they
do
not
need
cat
vpf
because
they
aren't
the
one
ones,
actually
loading
the
BPF
programs.
Instead,
they
simply
create
their
program.
M
Specific
crd
object,
whether
it's
XDP
program,
crd
TC
program,
crd
or
Trace
Point
program,
crd
and
bpfd,
which
is
the
privileged
entity
in
the
system,
will
load
their
program
and
manage
all
their
map
pinpoints.
Then
what
the
applications
do
is
use
their
existing
Mac
management
library
in
order
to
interact
with
those
programs
on
the
host.
M
So
the
overhead
to
kind
of
integrating
with
bpfd
stays
somewhat
small,
but
we
still
get
the
observability
and
Security
benefits
of
using
kind
of
a
centralized
Damien,
and
it
also
allows
other
kubernetes
users
to
dynamically
deploy
BPF
programs
to
the
cluster.
So
if
you
have,
you
could
have
some
core
infrastructure
operators
in
your
distro
that
need
to
use
BPF.
They
could
also
use
bpfd,
but
then,
if
you
wanted
to
open
up
the
ability
for
customers
to
do
so
as
well,
you
could,
which
is
kind
of
a
cool
feature.
M
So
we're
all
here
for
kubernetes,
right
and
so
I.
Think
one
of
the
biggest
one
of
our
biggest
focuses
in
the
bpfd
community
is
how
we
make
this
work
on
kubernetes
and
the
value
add
we
bring
to
kubernetes.
So
part
of
that
was
writing
an
operator.
I
already
mentioned
before
it's
written
go
using
operator
SDK.
Many
of
you
folks
are
really
familiar
with
it.
You
can
really
easily
test
it
from
our
project
today,
which
is
a
simple
make.
Target
make
run
on
kind.
M
It
includes
a
couple
kubernetes
apis,
the
first
one
being
this
is
actually
stale,
because
we
just
changed
our
API.
We
have
dedicated
program
types
instead
of
BPF
program,
configs,
so
TC
program,
xtp
program
and
Trace
Point
programs
are
the
types
reported
today,
we're
also
hoping
to
expand
to
you
probe
and
a
few
others
in
the
near
future.
M
In
addition
to
that,
we
have
a
BPF
program
crd,
so
this
is
used
to
store
per
node
metadata,
and
it's
also
going
to
soon
be
used
to
enhance
the
observability
of
a
cluster.
Bpfd
will
be
able
to
report
back
to
admins.
All
of
the
BPF
programs
are
loaded
on
their
cluster,
not
just
the
ones
that
we
loaded
with
bpfd,
but
everything
that's
running
on
your
system,
so
an
admin
will
be
able
to
go.
Okay
list
show
me
all
the
BPF
programs
around
my
cluster
that
aren't
controlled
by
bpfd
and
what
are
they
doing
right?
M
What
are
they?
You
can't
do
that
very
easily
today.
The
last
thing
we
include
is
a
config
map.
Instead
of
having
a
dedicated
crd
for
configuring
operator,
we
are
just
using
a
config
map
because
there's
not
many
things
to
configure
the
last
really
cool
thing.
To
note
here
is
our
BPF
bytecode
image.
Spec,
we've
written
this
in
order
to
solve
the
problem
of
Distributing
BPF
bytecode.
Today,
the
way
it
works
is
bytecode
is
often
embedded
into
the
binary
of
the
user
space
application.
That's
loading
it.
M
So,
therefore,
in
order
to
release
a
new
BPF
program
version,
you
have
to
release
the
new
user
space
version
too.
What
we've
done
is
packaged
BPF
programs
into
bytecode.
So
now
you
can
have
fine
gearing
conversion
and
control
over
your
BPF
program
and
you
get
all
of
the
benefits
of
a
standard
oci
container
image
such
as
signing,
so
that's
really
cool
and
it
allows
us
to
kind
of
integrate
with
kubernetes
a
lot
easier.
M
Okay,
so
I
am
not
going
to
actually
do
this
demo
in
front
of
you,
because
it's
going
to
take
some
time,
I'll
just
talk
through
it
really
quickly
and
on
the
slides
or
explicit
instructions
for
doing
so
pretty
easy.
All
you
need
is
kind,
so
you
can
give
it
a
try,
but
what
this
demo
shows
is
two
xtp
programs
being
attached
to
the
same
interface
on
every
node
in
your
cluster.
One
is
clobbering
the
other
at
first
and
then
you
change
the
priorities
and
now
it's
no
longer
clobbering.
M
So
it's
it's
very,
it's
a
very
simple
demo
and
we
Show
collaborating
versus
not
collaborating
by
just
counting
packets,
so
this
is
kind
of
a
deeper
image
into
that
demo.
So
please
go
check
it
out.
M
If
you
want
it's
a
lot
easier
to
run
now,
we
just
got
done
with
our
0.2.0
release,
so
everything
will
stay
working
great,
so
I,
just
kind
of
give
you
all
a
really
fast
overview,
10
minutes
trying
to
not
take
all
the
else
time
of
of
BTF
and
bpfd,
but
we're
more
interested
in
why
we're
here
is
trying
to
figure
out
what
this
looks
like
for.
Kubernetes,
obviously,
we
brought
this
up
to
six
Network
already
around.
Where
should
we
continue
the
discussion
and
out
of
that?
M
We've
actually
created
a
new
slack
channel
for
bpfd
and
a
slack
channel
for
evpf
in
general
in
kubernetes,
so
you
can
find
us
in
either
places
we're
also
trying
to
figure
out
what
role
do
the
sigs
network?
No
security
want
to
play
here
right,
like
obviously
signode,
would
care
about
this
this
technology,
because
a
user
could
really
easily
break
a
cluster
right.
I
mean
you
could
tear
everything
apart
with
BPF
today
super
easily,
and
that
includes
cubelet
and
everything
around
it
right.
M
So
we
really
want
to
have
multiple
sigs
involved
and
we're
trying
to
figure
out
if
that
needs
to
be
in
a
dedicated
working
group
under
a
certain
Sig
or
its
own
Sig,
probably
not,
but
just
ideas
were
thrown
around
and
then
the
last
question
we
want
to
ask
is
like:
could
we
see
some
of
these
apis
being
endorsed
by
multiple
sigs
meaning?
Maybe
everyone
doesn't
want
to
use
bpfd,
but
we
all
agree
here
that
you
know
having
apis
to
control
BPF
in
a
kubernetes
cluster
is
smart.
M
So
why
don't
we
put
those
in
upstream
and
let
various
implementations
flourish
and
then
the
last
thing
short
roadmap
for
bpfd?
You
can
see
a
bunch
of
stuff
we've
gotten
done
super
excited
about
that.
You
can
also
see
our
tracking
project
for
more.
We
have
a
bunch
of
cool
features
in
the
pipe
around
observability
and
some
other
cool
stuff
like
being
able
to
attach
XDP
and
TC
programs
to
interfaces
based
on
pod
labels.
M
So
if
you
want
to
attach
a
TC
program
to
all
pods
with
label
X,
you
can
do
so
so
yeah
I
just
want
to
open
the
floor
up
for
any
questions.
The
last
slide
is
just
links
thanks.
So
much
for
your
time
today
and
yeah,
you
can
find
us
if
we
don't
get
to
answer
our
questions
in
ready
slack
at
bpfd
or
hashtag
evupf.
So
thanks
so
much
for
your
time
today.
B
Yeah
I
have
a
small
question
follow-up
question
on
that,
like
you
said
that
signals
will
be,
will
care
about
it
because
of
security
and
reliability.
I
also
curious
how
much
prob,
how
many
problems
do
you
experience
with
attribution,
ebpf
with
specific
ports
and
processes?
I
know
pixie
had
a
lot
of
like
I've
been
talking
to
them
and
they
had
a
lot
of
problems
with
attribution
specific
board
to
signals
they
receive
from
ebpf
and.
M
B
So
if
you
I
mean,
if
you
receive
EBP
applicants
on
a
kernel
level,
you
don't
necessarily
know
which
Port
this
belongs
to,
like
you,
you
know
like
some
event
about
the
process,
but
you
don't
know
which
annotations
so
like
image
is
process
is
built
on
like
cover
right
architecture.
Do
you
like
and
I
understand
that
many
providers
may
want
this
information
and
they
all
will
need
to
hook
up
in
some
EBP
of
pro
to
start
events,
or
something
like
that?
So
do
you
experience
any
problems
with
that?
Like
do
you
provide
any
solutions.
M
I,
don't
think
we
provide
any
solutions
for
that
use
case
yet,
but
this
is
why
we
came
to
y'all
like
that
sounds
like
a
really
great
first
use
case.
If
you
could
even
just
jot
a
note
down
under
my
agenda
topic,
I'm
happy
to
make
an
issue
and
kind
of
see
what
we
can
do
in
the
future.
H
And
one
of
our
projects,
we
were
doing
the
same
thing
by
inspection
from
ebpf
program
with
C
group
passport
process.
It's
not
really
trivial,
but
it's
doable.
M
Yeah,
if
we
can
do
the
duplicate
effort
around
you
know
even
tracing
logging
and
debuggability
of
the
of
BPR
programs,
like
that's
kind
of
what
we
want
to
do,
especially
in
our
operator,
and
we
can
do
that,
because
bpfd
can
provide
any
node
specific
level
metadata
that
we
would
need
hopefully
to
be
able
to
report
that
back
to
the
user.
So.
A
N
M
Yep
and
I
think
this.
This
brings
up
some
good
stuff,
I.
Think
the
next
time
I
come
back
here.
Someone
from
my
community
comes
back
here.
If
we
can
give
you
all
a
presentation
on
like
a
distinct
short
observability
tool
or
use
case,
we've
done.
That
would
be
really
cool
if
we
can
show
like
what
you're
saying
a
BPF
program
loaded
on
a
node,
it's
loaded
by
a
process,
we
don't
recognize
like
here
we're
giving
that
information
back
to
the
user.
A
M
Cool
so
then
I
I
really
appreciate
y'all's
time
we
are
having
a
generic
evpf
meeting
for
kubernetes
coming
up
on
June
5th.
You
can
go
on
our
Channel
and
find
out
more
and
we
have
a
weekly
bpfd
meeting
every
week.
There's
more
information
in
our
GitHub
and
our
website
please
reach
out
with
questions
I'll
keep
in
contact
here
like
we
want
use
cases
coming
from
cignode,
and
we
want
to
see
how
we
can
make
this
better,
so
really
excited
about
it.
Thank
you.
A
Great,
so
the
next
item
on
the
agenda
is
CRI,
pull
image
with
progress.
O
Yes,
it's
mine,
hello,
I'm,
just
bringing
this
up
again.
This
was
started
last
year.
Unfortunately,
I
lost
time
to
work
on
this
since
then,
now
I'm
back
in
the
game,
so
sort
of
so
I
did
a
bit
of
a
profile
concept
for
the
cubelet
implementation,
how
it
will
be
used
for
those
who
missed
it,
or
maybe
don't
remember,
because
it
was
so
far,
though
briefly.
This
is
about
extending
the
CRI
to
support
not
just
a.
O
Requests
but
also
pull
image
with
the
progress
so
that
the
runtime
will
send
back
the
information
every
now
and
then
about
at
what
stage
the
image
pulling
is
at
the
moment,
and
the
request
is
supposed
to
be
parameterized
and
the
runtime
is
supposed
to
act
upon
the
parameters.
For
example,
the
image
can
be
requested
to
be
pulled
with
the
progress
reported,
every
one
gigabyte
or
every
30
seconds,
or
maybe
every
five
minutes,
or
maybe
every
25
percent
would
be
being
downloaded
and
then
backed.
O
Is
the
image
available
or
should
I
wait
for
five
minutes
until
runtime
just
fails
with
the
timeout
not
being
able
to
reach
the
registry,
and
for
that
purpose,
it's
nice
to
have
the
events
published
on
the
board
object
when
the
image
has
been
pulled
just
so
that
the
owner
of
the
workload
knows
that
something
is
going
on,
and
then
we
can
judge
approximately
how
much
time
left.
Okay.
A
Thanks
Alex
I
think
that
in
general
sounds
useful,
but
I
want
to
make
sure
that
we
are
also
covering
something
that
Docker
shim
used
to
have
so
with
Docker.
Shame
right.
If
an
image
is
taking
too
long,
then
the
cubelet
was
able
to
talk
with
Docker
and
give
it
more
time
to
pull
the
image
right
now
with
CRI.
If
we
have
a
very
big
image,
we
could
be
timing
out
and
we
could
like
it
could
result
in
pulling
trying
to
pull
the
image
again.
A
O
A
N
A
N
O
Right,
anyway,
so
to
review
the
serial
proposed
change.
I
think
the
comments
last
time
were
that
it
would
be
nice
to
see
the
sketch
of
design
of
how
how
it
will
be
used
in
the
keyboard.
So
now
that's
there
I
didn't
find
anything
obvious.
That
would
be
wrong
with
it,
but
I
will
now
consider
what
Bruno
just
mentioned,
of
course,
and
otherwise
it's
open
again
for
the
comments.
O
N
K
N
N
F
O
Do
you
think
it's
worth
of
getting
rid
of
it
straight
away
this
moment?
If
the
percentage
is
hard
to
calculate,
should
we
just
not
consider
it.
N
I
think
so
it
just
with
the
current
distribution
spec
API.
We
really
don't
know
what
the
size
is
going
to
be.
N
In
time
is
the
problem
with
large
ones
right
it's
more
of
a
yeah
timeout
with
progress.
So
if
you
haven't
had
you
know
a
megabyte
over
the
last
10
seconds,
that's
easy
to
do.
But
if
you
say
time
out,
if
you're
not
finished
within
five
minutes,
that's
good
for
small
images
but
horrible
for
20
gigabyte
images.
O
N
Haven't
read
your
cat,
but
the
other
thing
that
can
happen
is
you
can
be
trying
to
pull
the
same
image
from
or
multiple
containers
or
multiple
pods,
and
we
do
cash
that
up
so
only
one's
pulling
at
a
time,
but
the
others
are
waiting
so
again
that
timeout
progress
might
needs
to
be
tied
to
the
original
yeah.
Just
just
a
heads.
O
C
O
A
O
A
N
O
A
Important
I
mean
we'll
be
great
if
we
can
make
progress
on
this
one
yeah.
D
A
A
Foreign
thanks,
so
next
one
we
have
is
PR,
which
is
talking
about
disabling
CPU,
could
have
a
guaranteed
but
Martin
I
know
you
are
on
the
call.
F
F
F
Well
reservations
about
one
of
one
of
the
pieces:
this
is
the
smallest
possible
patch
that
would
work.
However,
there
are
reservations
about
the
being
disabled
for
guaranteed
quality
of
service
pods
that
have
no
CPU
pin,
meaning
from
security
perspective
and
resource
security
perspective.
It's
actually
opening
the
whole
just
a
bit
too
much.
At
least
that's
the
that's.
The
reservation.
I'm
hearing
I
just
added
a
comment
to
the
pr
I
have
a
private
Branch,
where
I'm
playing
with
the
with
a
different
approach.
That's
slightly
more
secure.
F
It's
also.
The
patch
is
also
a
bit
more
invasive,
not
too
much
I.
I
thought
it
would
be
worse,
but
I
basically
need
some
guidance.
Probably
I
mean
we
have
franchise
conspati
here,
but
we
probably
want
to
hear
from
Kevin
who
is
not
here
about
which
approach
is
preferable
to
him
since
he's
the
one
that
needs
to
approve
it,
but
we
don't
have
him
have
him
here.
F
So
the
the
one
thing
we
can
discuss
right
now
is
I
can
take
my
private
branch
and
basically
post
it
to
the
PR,
but
I
don't
want
to
lose
the
current
solution.
Just
yet
so
I
can
open
a
new
PR
and
we
can
compare,
but
that
that
splits,
the
discussion
and
I
don't
want
to
do
that
either.
So
you
know
how
what
what
you
do?
What
should
we
do?
What
should
I
do
with
the
approach
here
by
the
way
Derek?
Is
there.
I
I
I
was
just
catching
up
on
the
pr
and
I
I
was
just
trying
to
make
sure
I
didn't
misunderstand
something,
or
maybe
you
could
share
what
the
concern
was.
I
guess
I
would
be
apprehensive
on
eliminating
the
use
of
CFS
quota
for
guaranteed
pods.
That
did
not
have
exclusive
CPUs,
but
if
the,
if
the
pr
is
restricted
to
pods
that
are
already
given
exclusive
CPUs
I,
actually
don't
understand
the
counter
argument
to
keeping
CFS
quota.
I
Is
there
some
relationship
there
that
I
might
be
missing
that
others
are
raising?
That
would
give
you
a
pause
or
no.
F
No,
no,
you
actually
described
it
perfectly.
The
reserve.
The
current
PR
is
removing
CFS
quota,
even
when
there
are
no
pin
CPUs,
no
exclusive
CPUs
and
that's
just
a
bit
too
much.
L
I
What
I'm
gonna
make
sure
I
understand
because
yeah
I
think
yeah
that's
too
much
so
it
would.
We
all
feel
good
on
the
Sweet
Spot.
Then
I
was
saying
if
you've
been
given.
F
No
pins
only
totally
makes
sense,
I
think
I
have
a
solution
for
that,
because
I
mean
the
current
apis.
Don't
allow
that
you
know
don't
allow
that
so
I
I
had
to
do
a
few
changes
to
well
I
had
to
add
to
two
methods.
Two
couple
of
interfaces
and
now
I
think
it's
possible
I
linked
my
my
private
Branch
there,
as
I
said
I
can
either
put
it
into
the
pr
directly
losing
the
current
solution
or
I
can
open
a
new
PR.
F
I
Okay,
yeah
I
I
wouldn't
have
any
hesitation
to
merging
a
PR
that
eliminated
CFS
quota
for
containers
that
had
exclusive
CPUs.
Oh.
F
F
Basically,
if
you
have
a
container
that
has
exclusive
CPUs,
obviously
you
can
remove
CFS
quota
because
you
are
going
to
be
limited
by
your
CPU
Affinity.
You
will
never
be
able
to
run
on
CPUs
that
are
not
yours,
but
the
the
parent
C
group,
the
parent
slice,
will
prevent
you
from
using
them
because
the
parent
size,
the
the
sandbox
one,
is
actually
limiting
you
as
well.
It's
the.
I
Parent
yeah,
if
the
sum
of
containers
in
the
Pod
were
guaranteed
quality
of
service
and
all
used
integral
cores,
then
yes,
that
would
be
perfectly
fine
merging
that
PR
to
not
have
CFS
quota
restricted.
If
there
was
a
guaranteed
pod
that
did
not
use
integral
cores,
then
we
would
be
in
a
in
a
gray,
Zone
I
guess,
but
if,
if
you
wanted
to
get
to
that
spot
to
be
reaching
like
a
desired
outcome,
I
think
that
that's
perfectly
fine,
so.
I
F
I
I
And
I
was
just
trying
to
think
of
the
simplest
thing
that
I'd
be
like
yeah.
That
makes
sense,
so
you
don't
need
it
if
I
have
to
think
through
the
the
Pod
with
partial
case
to
know
if
the
hierarchy
works
as
expected,
yeah,
maybe
that's
the
best
thing
to
do
next.
A
We're
out
of
time
folks,
David
did
you
have
a
quick
comment?
No.
A
A
All
right
folks,
thanks
for
joining,
see
you
all
next
week.