Kubernetes SIG Node, 24 Jan 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20230124

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20230124-180453_Recording_1600x720

A

Hello, hello: it's a signal weekly meeting, it's uh Tuesday, January 24th 2023, welcome everybody as regular. We want to start with acknowledging some work happening in uh um signal, so we have 14 PR submerged, uh some of them related to signals. Many that are just touching signals a little bit. You can review what's happening and there are a few closed and I think one of the closed PRS will be discussed by Jack. It's an agenda today, um yeah.

A

If you're interested, what's going on uh check out this links uh created PR is also what's people creating and what kind of work is happening. uh It's all interesting links and uh I hope it will make. You feel less formal.

A

uh First agenda item today is sidecar continuous cap I wanted to give update on working groups that we're running. um There is a cap um that is submitted on GitHub. You can review and give you feedback, I. Think I, wrote feedback so far is very positive. We still have one unresolved naming issue that you need to solve, but rather than that feedback very positive and I think we're in a good uh track uh going forward.

A

I want to give some update what it will be. So uh if you before, you even start reading this cap, you know what to expect High prepared a few slides, so um sidecar containers. So we, um the pattern of sidecar, was published long time ago, uh almost when kubernetes started. The idea is that you run some small container alongside your regular container. It runs in the same namespace and it's very close to what your container has like. It has same access to similar resources um and that's how it can be useful.

A

You can get resources, you can extract resources and the best part you can configure the sidecars the way you want, uh so you can version it the way you want you can deploy it. The way you want so there are many benefits of having a sidecar um and sidecars are quite easy to implement on regular web services. You just run another container alongside your container and that pretty much.

B

Solves okay, sorry to interrupt I! Don't think we see your screen, we can reshare it! Sorry! Oh.

A

Okay, thank you for seeing me. um What do you see.

C

A

A

Is this, can you see now.

C

D

See a black screen.

A

This is unfortunate.

A

um I can share the link in a meeting notes.

A

It's juicy black screen, yeah screen sharing, but we don't see anything.

A

Okay, um um I, don't know what to do. um Never had this button before if I try to share my whole screen like this, let's make a final attempt and if it doesn't work it doesn't work. Oh I can just speak through the slides.

A

A

This is my entire screen now.

A

Yeah I posted the link into meeting notes. Okay, um if you want to follow along you, just click on link in the meeting notes and you'll have a slice. I hope the sharing permissions are yeah, everybody well, yeah um pattern is well known and it works uh generally. Fine, now going to site number three, um this uh implementation of sidecars fall short uh on some specific scenarios.

A

uh Mostly people have problems with jobs uh that has completion if a job runs to completion, and it typically has a restart policy on failure or never and uh when uh main container terminates it expects the sidecar also terminated by itself and then Port will be completed, but sidecars generally don't know about um any other container, so they don't know when to terminate they just uh they typically designed as a demons that runs forever.

A

So to solve this problem, uh we need to have some built-in Primitives uh today, people solving it with all sorts of hacks, um but uh having built-in primitive, would really help. Then we have a problem with shutter smash. um Shadow smash allows you to configure your port to have only mtls communication, for instance, so you can have inbound and outbound traffic to be protected.

A

Unfortunately, you can run serious mesh during unit containers in runs, so you can protect everything, but you can protect init containers and it's not very good from security standpoint um there's something we want also wanted to solve, and the general pattern when you collect logs many log collection demons want to be implemented as uh as an endpoint where you send logs to, and if you implement it this way, then you can't get logs from initialization and from post startup on container startup.

A

um It will be much easier to implement with a sidecar running very early and running through initialization station running through all the other can be a startup. So having all these problems the and having sidecar pattern extremely popular. We want to solve this problem and make a built-in primitive in kubernetes to support it. So the proposal are going to site four um I see many people switch to the slide proposal is to have a restart policy always to be applied on individual containers.

A

In this case, you can see, on the right hand, side. There is a init containers collection, uh it's a list today. It's ordered at least, and you can Define some init containers that download certificates that will be used by a sidecar container, and then there is a istio process that will be started up uh and uh by specifying this restart policy, always in init containers collection, you will. uh We will make it such that it will not be terminated to the port.

A

While Port is running so to survive through initialization stage, it will go through all the container stage and when all other containers are done working they will not be restarted any longer. This country will also be terminated.

A

This is a proposal um and it fits quite nicely to solve all the problems. At the same time, it's very targeted and small change that doesn't reduce any new collection or anything like that. So we believe that it will be. It will work in best for most scenarios.

A

uh Lastly, uh like if you look at the service mesh implementation uh on site number, five, um you will, um as uh on previous slide, you'll, probably get some certificate, uh maybe even using different containers. Some implementation use uh sidecar container itself to download certificate. Some implementation will use other containers, the download search here, but uh whatever way you do it, you do some pre initialization, then uh container will update this IP table for support.

A

So all outbound connections and inbound connections will go through the proxy and then we'll start the procs here to go through uh and finally uh when, um when, when uh this proxy already started and ready to be used, uh you proceed with in all other containers in Civilization.

A

So if you have our next Port doing some file download, this download will already be started through the proxy and then, when the regular containers start only, then you will make the uh Your Side characters to be ready and all other containers that at some point will become ready and they will start receiving inbound traffic, and this inbound traffic will go to the same proxy that you started uh on a early stage and finally like.

A

If proxy fails, we will restart the proxy, so it will be running uh and while it will be restarting the whole Port will be not ready, because you have a Readiness Probe on this sidecar container and all the inbound traffic will not be sent to your Port. Outbound traffic will fail because uh proxy is not running uh so you'll need to deal with that in your application. Somehow and finally on determination.

A

uh Sidecar will ignore sick term and it will keep running um uh and then, when all other conditions are terminated only then a sidecar will be generated as well.

A

Some more highlights um very interesting and very unusual for containers. Today, sidecars will be restarted even during Port termination. So you have very long grace period for your Port termination and during this uh long time, sidecar of happen to fail. We will restart it, so you don't lose connection to uh outside world in case of service mesh. uh We also will solve some ohm score adjustment problem today.

A

Ohm score is calculated based on percentage of request of a container and sidecars are typically super small, so sidecars are often the first Target for um IQ and it's uh it's making uh ports a little bit more unstable because, like you have a huge Port that requires connection and this connection being terminated because uh ohms for adjustment is so small for sidecar containers in general.

A

We will try our best to keep side cars running, but we wouldn't do uh I mean we had other ideas how to do it, but all this ideas was rejected because they make sidecar containers too special and we don't want to make them too special. Otherwise all conditions will be implemented sidecars and it's not. The side effect you desire.

A

uh Finally, I wanted to go through other scenarios for restart policy field. We, um since we'll introduce a restart policy per container. We may use other values of restart policy going forward and some scenarios you can Implement that is not possible. Today, is uh for jobs that should run once to completion. uh We may have some initialization that may need to be restarted. Maybe it's flaky, so this will be possible with uh if you will Implement peer container start policy.

A

It's all part of this care, but it may be introduced later and, as example, is a job with two containers running with different uh tolerance with for a start, and it also will be possible to implement. There are some requests to have this functionality uh and it will be done. It can be done uh when we will have restart policy per container.

A

um There are some alternative was rejected. I listed it here, but you can read more details in cap that I sent before this number site number eight uh site number: nine. um We what we don't want to implement, which is uh asked sometimes especially for service mesh implementation- is this uh final scenario of a serious mesh. When uh sidecar crashes we have, we will not provide any built-in functionality to stop other containers from running, so other conditions keep running and they may try to do some outbound connection.

A

uh We're not going to solve it with some built-in functionality, but you can work around it by implementing some. uh Thank you liveness probes. So, for instance, you can put a liveness probe on a regular container to point into Sidecar. This way whenever sidecar is down, a regular country will also be killed alongside the Sidecar uh anyway. Finally, uh another problem with uh here a lot for sidecars, but we don't plan to solve. It- is uh security boundaries between different types of containers.

A

um We know that uh some people want to make sure that any like app containers are not privileged, so they don't do volume mounts like don't, don't Mount the volumes and don't uh change IP tables, uh while sidecars and some installation containers can be restricted either by time or by a scope what they can do. So you can make service mesh to be able to update IP tables, but no other containers will be able to do that. Foreign.

A

Sounds uh more um spread out than just sidecars, so we're not following it in this cap, but we can try to approach it later and I switched to zoom and I see hand from dawn. Sorry, then, uh you may be having this hand for a long time. um Please pick up. It's.

C

Okay, actually most of my problems already and the internet later so, but but one thing's, one comment: I think since anyway, you introduce this restarted policy uh for the static car actually in the inita phase, right so kind of Reason. Lastly coverage. So it's okay, because one of the problem, we do worry about the resources right, because the set count didn't have the resource requirement and the part of the need. So you think anyway, we are now we change this. You need to Containers semantics with this restart policy right, so so I do think about.

C

The use also can consider to put up the resource request. There was there any way we change that to semantics or ID right. So if you, if you have a container and request uh this one before we feel like the, we don't need the request, those kind of things it's just, because we already calculate uh the regular containers resource usage.

C

So you need container anyways just to to do the preparation, so they can reuse those part level of the resources, and we also don't want to over, um like the uh over conservative right, so next reduce of the node.

C

The resource, the utilization, so that's why we didn't allowed you put that resource required, but now you change that semantic for this, uh for you can, along with maybe cost user, could add some or could be also defaulted now, but still, if you need additional just point on here, so you don't otherwise, because we kind of heard istio all the said.

C

The car is really small, but they cannot guarantee people mind using this one for many other purposes, so that so the resource might not as small as what what the what we understand today, the things anyway.

A

Just allowing topic uh we covered it uh in details in cap, um so we will change the formula how we will do a resource calculation, and this formula is now quite complex, so uh it used to be um I mean not trivial but easy to implement, and now it will be harder to implement so because of that, we plan to expose this result of computation either as a metric or some Port status.

A

So this will be changed and updated, uh and also Francesca and swazi did a good job of looking into different managers like topology manager, CPU manager to understand how sidecar containers resource usage will be um will need to be accounted for by those plugins online, not plugins, but different managers. The problem, like, as you said today, done some resources being reused from init containers by regular containers, and in this case you also need to make sure that reuse will be properly updated.

A

So we will not try to account for sidecar containers for regulating containers and try to claim their um uh it's uh their. Those Resources by irregular containers, foreign.

C

Yeah Alexander is right because anyways going to handle that Sky donor said yeah.

C

Another comment, which is the main minor point at this moment because we have to do, um is just you mentioned that, because we are introduced this uh first time. This is per container restarted policy right, so so, which is kind of the change you need to continuous semantics become to the static container, but that's also uh actually make the future per content per containers restart the policy actually is harder, because you need to think about this with per container restarted.

C

How to say you need the uh because in the past we can't get the one size fit all like you need container anyway. You have to start the one by one, because it's just do preparation and until finish- and we start the second way so you treat this is totally separate- the logical and then you are introduced up the regular containers and then regular container. Then each one you treat that logic is separate right, so so because anyway, they can concurrent uh start uh separate, restart all those kind of things.

C

But now you have to have the additional logical say: oh this is in the init, so it's set a container restart the policy. Imagine the different uh from the rest of the regular container. So uh just this is minor point. If it's not the block, as but I do think about the evil risk per container. Restart policy is important that this is make. That is more complicated.

A

Yeah and I have a section about it in cap. We really want to scope this into implementing the other needed things for sidecar cap and all the future scenarios where you start policy. We kind of put on a shelf saying we want this in future, but we don't want to try to address them now, implementing those introduce all sorts of complexity, especially for cases when you go.

A

You change the restart policy to restart less because then you need to remember the state like if this country already been run to completion or it wasn't around the completion and this state is currently not preserved for regular containers, so that kind of complexity. We want to know that it exists and put it in writing, but we don't want to address and design around it yet, but we definitely will test it extensively for sidecar scenarios.

A

Okay, um thank you for attention. uh I took 20 minutes of a meeting. uh I hope that uh we will still have time for all other topics. I know uh we have a lot. If you have more questions, please go to cap and review and give a feedback. Thank you.

D

Yeah I think one comment I had uh when I uh as we are discussing this per container restart policy. uh Recently we were looking at, we know in resize policy we have explicit for restart container or no restart or restart not required. uh Tim is discussing we're discussing with we're discussing uh making that more explicit. uh So under resize policy we call it uh restart policy for resize and I'm wondering if having this is gonna cause some confusion, and if there is a way we can uh make it more.

D

You know we want to have one restart policy, not two or three I'll um I'll share the link on the cap, and then we can discuss this offline.

A

That's perfect, thank you. Good feedback uh naming is always hard and let's try to make it uh less hard on us, but good feedback. Thank you is there are no more questions. uh uh David, do you want to go through another life cycle.

B

Yeah sure um yeah thanks very good for the presentation. I'll, definitely take a closer look at the cap, um yeah for node life cycle, so um kind of wanted to share this uh connect issue um and kind of just get some thoughts about it, and I was curious if there was any prior work done it on it.

B

um So the context here is that I was chatting with with Tim a little bit and uh Tim filed this issue on GitHub, um and we discussed a little bit and so the context here is basically um you know. We have quite a good amount of work. That's going on going on right now regarding pod life cycle, I'm, actually working on a doc that describes products.

D

B

More and um we were having some discussions last week with with Clayton and other folks about some improvements in this area, uh but one of the things we haven't really looked at too much is node lifecycle. Yet um the context here is that node lifecycle isn't really well documented around how different components and kubernetes should react.

B

When um you know nodes come and go so a good example to explain this is, for example, uh if the cloud, um the cloud provider, for example, spins up some resources uh that are needed and prior to the node being deleted, the other resources need to be deleted. So there's a little bit of dependency. There there's no real way in kubernetes for Signal. You know like the node is about to be deleted, so uh cluster Auto scaler, for example.

B

It has a specific taint that it applies on the Node and then uh you know it says like this: node is going to be deleted, but that's kind of very specific to Cluster Auto scaler and uh it's not like a standardized taint and if other components want to implement it, they would have to use their own team uh so kind of what what Tim kind of proposing this uh issue is. Maybe we need to kind of standardize a little bit?

B

What is the node life cycle and uh specifically the termination phase of it so, for example, for other kubernetes objects right, there's, finalizers and uh that's kind of the standard way to to block uh deletion of resources. If you need, if there's some other dependent resources, so maybe we could do something like uh you know, we could put the finalizers on the Node object and we would document that you know if there's finalized, no, no object. The other component shouldn't delete that underlying VM, for example right something like that.

B

So this is kind of just I'm curious to see if there's any work done in this topic or if anyone else has kind of issued that are in this area, that would require kind of documenting no life cycle, but specifically the termination phase to make it a little bit more standardized across components.

C

David, uh we might have, they wrote down dog both in the open source and also gke.

C

Some of those might be not up to date, so talk to reach out to Nato, and so so we do have some things and we might help you yeah yeah. But this is good idea, because I remember this topical come up last year sometimes- and we do suggest people document that, where I have the new new document up to date and I didn't see, people doing this. So I think this is good to start foreign.

B

Some discussion, um okay, I I, can reach out to see what's done and I think the the outcome here is maybe eventually we'll try to write this up somewhere and uh kind of standardize it so that other components could could follow some existing model instead of having like a taint per per component that you know, no one else would would know.

E

Yeah that makes sense. It may also make sense, I think, to integrate this in a more of a bi-directional plug Eventing. You know how we're doing the events from that are coming Upstream into the into Kubla.

E

I could definitely see you know some Downstream events two coming in, so that we would know you know better to shut down than just you telling us to close all the pods I mean we can take that hand, but it would probably be better if we knew that you know it was going to happen same thing on couplet reboot, right.

B

E

B

Think I think I think some of those things are a little bit um missing today, like the a good signal, especially if there's other resources, other controllers that need to be watching and doing some type of cleanup and reaction to this, um especially that's prior to the deletion instead of after the deletion yeah exactly.

E

In Don's right, Lance house, the right guy, if you can bring, maybe lantel me Peter Ronald Fielder, guys that might be interested in this space. It'd be fun to discuss.

A

Just a couple days ago about one of the things that we Implement uh sustain is used for post VMS that are not killed but uh they're still around, but not running, so that maybe uh also interesting discussion and I think it restarts I I, remember agent, documentation about uh uh like not with the same name appearing uh after some time that uh we try to treat it as the same null. But then it's not the same. So that may be also part of the discussion. Foreign.

B

Thanks all for the feedback, cool.

A

Okay, uh next one is uh I'm sorry I share.

F

Yeah yeah, that's right! That's.

D

F

um So, but we are, we've been thinking about how to improve a container uh security, uh specifically how to detect the um containers where files have been modified. The containers are having compromised somehow uh We've looked at the IMA resources that they are available in the Linux kernel, the Integrity measurement architecture.

F

We are moving uh well. Some colleagues of mine are working on a Linux kernel patches that will allow to use IMA.

F

Convenience inside the container, so we could actually check whether containers, fast files inside containers have been modified or not. uh The end goal is to push this into the entire kubernetes stack so to to change. Maybe the port security context CRI.

F

We are also talking to the open community and Community to try to put changes as well, so they can add so in in a nutshell. The whole idea is to understand that when you spawn a container, it creates a number of namespaces in Linux. We want to add a new namespace that we call it Ima namespace that will help us detect and file changes inside containers and eventually kubernetes be able to detect when a container has been compromised.

F

We feel that an issue and approved request or a cut we just want to know I just want to know, maybe your thoughts, whether this is a feature that is needed or worth investigating.

G

I I took a quick look as here and it it looks like we need to work our way up. The stack right like my first question is like I, went and took a look at the oci runtime spec request and.

H

G

In draft status, because the Linux kernel changes aren't uh merged, so other kernel changes merge. First, then we can do the spec, then run C and then you know start making uh opening.

F

G

Cap here, yeah.

F

Right, so we, and basically they tell the IMA name. Space is not merge still into Linux kernel. We are working on merch net and we expect to be merge. Maybe in first iterations, maybe a few moments. uh We felt an issue also with a run C and the mountainous in runs. He told us that, okay, we will not merge now anything even the oci specifications, because we don't have support in the Linux kernel yet, but we are working just in in every part of the stack on run C.

F

We are working on the Linux kernel and we want to know whether you know what you think about this feature. If it's worth integrating it in in kubernetes.

G

It definitely sounds interesting to me. That said, it seems a bit premature to already like have a cap, given that you know we need to get the kernel changes merged and get it get some kind of POC running with with run C or any other container runtime. That's my take I, don't know like Dawn.

C

I I totally agree with I feel like we definitely want to support uh if the Lorenzi and the container runtime have this feature right so which is if the which is mean like the kernel to support this feature right. So so uh once that is ready, because once the we don't definitely kubernetes is ready to support.

C

But right now is a little bit earlier. You you need the maybe like the kernel, merge, um uh and at least you can try to do some POC right on a single note. You do see this naming space work a while with the residency and uh and the container runtime right, so the single node um and then come to here, and then we can talk about how kubernetes enable this okay.

F

Okay sounds good.

F

Thanks, that's all.

A

Thank you. um Oh Kevin,.

G

It's a Kevin had a review for a requests. I made a pass. I think it looks mostly good to me. I just had left a minor comment, so I think we should be able to close that soon.

A

Okay, uh thank you. Another Kevin uh for both resource API I think uh there is a conversation of having a meeting uh Francesco. Maybe you can talk about it.

I

Hello, um so let me check about no I. We are. We are reviewing the cap, this cap and it's about extending for CDI, which is the dra use cases and, from my perspective, so far so good. So the content itself looks okay to me and I'm gonna have more comments or keep reviewing in the next day. So long story short. If there is a meeting Planet not aware about that.

A

um Sorry I I thought about previous one about uh I, believe there's some discussions, talk about uh updating, CRI and uh solving the problem. Is it addressed during the uh in the GitHub or there is a conversation going.

I

um Just for completeness sake, just see right things, I'm not up to speed the the cat about CDI for producers. Repair is much simpler and, while meetings are always possible, I don't think we we we're gonna. Have it maybe.

A

I miserable no.

I

No, maybe I'm out of date, no problem.

A

Thank you, hi Renee, um what's happening in in place, producers Club.

A

D

Sorry about that yeah the status remains unchanged uh as of uh I, just rebased it and resolved a test. Failure from the rebase and I think I see one more failure from Golden, CI, cozy Island I'll fix that uh today afternoon or Pacific time and uh where we I'm Tim and direct is direct here and I didn't see him on the meeting.

D

No he's sick, oh he's, sick, okay, yeah we've been trying to get together like we just need 10 to 15 minutes and uh we wanna we're syncing up offline to see which way to go. Do we want to merge the API and then follow up with the main, the mothership, PR or Tim favors, just merging the whole thing in one shot and I'm?

D

Okay with either ways we're still in that same spot, I'll just keep the peers ready and then hopefully, we'll be able to sync up this week or this week is a really good week for me. uh So we can do that.

D

um I! Don't have anything else to add.

A

Okay, um going to the next item, it's me um I wanted to highlight this box that was sold this week. um It's about HTTP props, um so um the situation here is that uh if there are many HTTP props and many containers defined, then we can run out of sockets or like not sockets so I'm. What was it uh so?

A

What is happening we open socket and we uh there is some time frame when uh so like connection already closed, but uh we keep socket open for some time and default is 60 seconds. So if we have a lot of containers with a lot of probes, then we can get uh into uh hit some limits on node and once we hit this limits on the nodes and we start um failing any HTTP connection, so um some um it's kind of noise enabled problem on steroids um that was addressed.

A

If you're interested in details, you can read this presentation. I didn't find the recording from Sig Network meeting, but maybe it will be up soon, um so this was fixed. uh If you see any customers can um um complaining about client timeout from HTTP props, then you can probably uh then you're probably hitting this issue.

A

uh As again, it was fixed on master and there are some cherry Peaks that we're running or hopefully we can address this problem and have it eliminated.

A

Any questions: okay, yeah, it's very unfortunate when uh probes are not reliable. In this case it was a situation when HTTP props didn't work, but but exact props did work because exact props didn't use the same resources, uh resources so um yeah. It was very unusual- and we know now that if customer has a host port and they actively using connections to many IP addresses, they also can exhaust uh node resources and make other ports unusable, so as opposed liveness crops can start failing. Because of that.

A

So there are some limits on the Node that you may hit with HTTP props, okay, uh swatchy.

H

Yeah, so the first item that I have is it's related to topology manager, GE graduation. We are close enough to the cap phrase, so I just want to release everyone's attention and request for reviews on that. I've noticed that we haven't added milestones and the leads opt-in label and things like that. So that was something I wanted to point out as well.

C

Today, here.

H

Perfect thanks Don. um The second item I have is it's related to a device manager bug that I've been working on so uh just to give a high level overview of it. um This there's a scenario where you know. If you were to restart cubelet or reboot your node, you have no control over how the pods are recovered, so your application pod, which is requesting devices, could get recovered before the device.

H

Plugin part has had a chance to register itself, and in that case, um from cubelet's point of view, we are not actually returning any errors and until the point the device plugin called or the device plugin. Actually uh the application pod tries to access the device. There is no error, manifested so with um with this PR that I have. What I'm trying to do is make sure that we are checking that the device plugin thought has registered and it's healthy, um so I I would really appreciate if people could take a look at this.

H

It's a bit complicated in terms of how we are how we reproduce this issue um and in order to do that, especially for the end-to-end tests, I had to modify the sample device plugin, which is a test plugin used to use for testing device, plugin scenarios.

H

So so the changes that were made to the sample device plugin was to intentionally prevent it from registering itself. So the second point that I have it's related to sample device, plugin changes, and we need to make sure that the image is promoted and it's accessible for end-to-end testing. So just uh just as an ovary for everyone, and if people have time. Please take a look at this because I know. Next week onwards everyone is going to get busy with gaps and stuff.

B

So just a quick question about that. Last um thing you mentioned: if the device plugin is not available and the pods tries to come up is the idea that the Pod will be projected during admission or how does that failure handled? Okay,.

H

Yeah exactly so, what we intend to do is that if the device plugin either hasn't registered- or there are no healthy devices, we we manifested as an admission failure for that part.

B

Okay, I'll definitely take a look. This is definitely an issue we've seen, especially with like GPU devices, GPU plug-in the mesh plugin. So thank you for attacking me again.

H

Perfect thanks David.

A

Hey, thank you. uh Marlo. Do you want to uh get updated on plugin a couple.

J

Plugins yeah yeah since last week and just a reminder that there's a discussion next Tuesday, we are working on uh further work on it. So we'll have updates at that point, including some code.

A

Okay, um I think uh we can go to the next one.

A

Portion box creation.

A

K

uh Hello yeah, uh so this is just a quick update on uh the the updating of the cap uh to basically reflect uh the the new condition that we introduced as an alpha, which was part, has Network like basically renaming it to uh something that Sig network is more aligned with, which is spot ready to start containers. So this was after a pretty long discussion with the Derek and Tim, uh where there were some concerns about the condition being named for as Network and what other things it might imply.

K

So uh I think we converged on a separate name and basically, let's just update to the cap. So if someone has some time to take a look, uh that would be great thanks.

D

So is the idea here that pod has network is one of the conditions that uh tie into the union of what? What defines what this rate is not containers.

K

uh No, so basically, it would suggest condition for his Network, because that's the part in the code where that situation is uh basically becoming true uh right now, so so I think like what we did is analyze like what would be better. What would be a better name for it, because a Sig network was trying to introduce this concept of multiple networks and.

H

K

One of the main things that they're trying to do is basically say that, even though you have an IP address in the sandbox status from CRI, it does not necessarily does not necessarily mean that that won't change over the lifetime of the pod. So that's why they were opposed to the name and they say like what is it that you're really trying to say so? Basically, what we're really trying to say is sandbox is ready, but there was also opposition to that. So the name we came up with is ready to start containers. Basically, yes,.

D

It makes sense.

C

Yeah makes sense, but to me part ready to start the container, it's kind of superset for Paul that has Network only so so the the the reason they think about. Oh, we are kind of the two. What I should say: two generic, because and once we have the math for medical support. This is not the let's assuming that, but then they propose an even more generic name and they supersite of the name part the Israeli to start the content anyway, I'm not good at naming.

C

So that's why I like the everything's mixes I will need to move because I'm not terrible on the name. So but they're just comment on this.

K

Yeah, this is totally a naming thing at this point. Yeah.

A

We spent good now amount of hours to discuss that so yeah yeah.

D

It fits well with, uh if you have to have policy based uh Readiness like okay, Network, a and network b or network C. If we have that condition and Google, it wants to enforce that or CRI wants to enforce that. Rather that makes sense either is.

C

I normally just say yes anyway, because I don't want I'm, not good at name and so I just need that so I have to trust other people with judgment on those.

C

Okay, I will I will take a look thanks, though.

K

A

Okay, uh we're moving on uh Jack uh exact, prop timeout.

L

All right folks, so um dims pinged me on an issue that Antonio was driving to remove the exec probe timeout feature gate which was introduced, I think in the 120 release cycle. So it's been a while and that addressed uh a bug for execro timeouts, not firing in Docker shim, so um we I think Sarah. You may even remember this.

L

Like a year and a half ago, we tried to remove the feature, gate and um Jordan very reasonably asked that we get some user feedback on um whether that would be safe for folks, because the feature gate exists um to protect back compatibility, even though Backward Compatible Behavior was sort of buggy for Docker Sim users, so fast forward to today and um uh dockershim was deprecated in 124..

L

I noticed that 123 is still getting. Did it get its last patch release last week, but it's it's going to be end of life really soon so I wonder if um simply the removal of Docker shim from the kubernetes ecosystem entirely gives us permission to remove the uh the feature gate. We can go through the process of locking it to true and then waiting a few versions. If we want to do that correctly,.

A

L

I, don't think it's important.

A

Problem on continuity, problems still exists, uh it's surfaced differently, but still uh breaking customers when when enforced, and we observe that uh on uh some scenarios that uh enforcing it may break customers uh unexpectedly, so you're referring to the situation when dockersham used to wait till completion of exact crop and then react on the status true or false, it does react on it and just says through even like. uh If it falls, um the problem is: when you enforce it, it suddenly became false and it can start killing containers.

A

So, even though functionality is not working um for container G is expected, uh it's still working worse when you uh enforce the timeout, because it can break uh customer unexpectedly and that's the biggest problem. So what we discussed is we wanted to have a plan and like help, customers, migrate um and uh I, think you created this metrics for customers to understand which timeout to set um and I don't think we interest any documents like how to help customers transition.

A

So maybe, if we can have a clear transition plan like this is like how to detect situation, and this is how how you transition. This is how you decide the timeout. Maybe if you have all the documentation in place, we can uh see uh if customers migrated.

L

Cool that sounds fair to me. So I will I'll work on on building up a set of documentation that folks can can walk through which AIDS them, how to to Leverage The the new metric to make their own measurements in their environment, and then I'll just submit a PR and then I'll work during the pr process where that documentation should land in upstream kubernetes.

L

And then, after that, we can do some sort of field work and reach out to the community, give a reasonable amount of time for folks to get exposed to this new documentation and have chance to assess their own environment. So I'm happy to own that over the next it'll probably take a while. So next couple of release, Cycles.

A

Thank you I'm happy to help. Okay.

L

A

Foreign agenda I think it will be good to remind that next Friday will be um prr product regions, review uh deadline for caps. So if a cabin flight uh make sure you have somebody assigned from product residence review and make them redo stuff and then a week later, the enhancement series uh I think it's February, 10th, so I think maybe next uh signal meeting monomic we can go through the list of caps.

G

G

A

um Yeah is there anything else on agenda.

A

If not, then happy Tuesday everybody and uh have a good recipe week. Bye thank.

K

You thanks see ya. Thank you.