Kubernetes Office Hours, 16 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Office Hours 20210616 (EU Edition)

Description

Office Hours is a live stream where we answer live questions about Kubernetes from users on the YouTube channel. Office hours are a regularly scheduled meeting where people can bring topics to discuss with the greater community. They are great for answering questions, getting feedback on how you’re using Kubernetes, or to just passively learn by following along.

For more info: https://k8s.dev/events/office-hours

A

All right welcome everyone to today's kubernetes office hours, where we will answer your questions live on air with our esteemed panel of experts. You can find us in the office hours channel on slack, please check the topic for the url for more information before we begin. We just want to take a moment for everyone on our panel to introduce themselves, say hello and share a little bit about them. We'll start from left to right. Please take it away. Am I right.

B

Hey everybody: my name is mario lauria, I'm currently a senior sre at cartax, a new financial services, fintech company, we're kind of building the team from the ground up and we are hiring so everything, kubernetes everything, modern sre principles and building a really positive awesome culture, especially focused on developer experience, so happy to be here.

C

Hi, um marcus johansson at equinix, metal, uh principal engineer on our developer relations integrations team, so uh that's working on uh kubernetes integrations, terraform, integrations and then every kind of sdk and cloud integration imaginable also hiring also uh looking for people who want to join this great opportunity.

A

All right, uh my name is david mckay. I am also a equinix medal. I'm a developer advocate and youtube streamer.

D

Hello, my name is porko. I work in canada with government organizations helping them implement solutions in public cloud providers and kubernetes.

D

Hi, I'm freya bassey was joined, swarm, I'm the vp product there and mainly work on kubernetes on azure, aws and on-premise and recently been dabbling a lot and working a lot on plus api.

A

Awesome. Thank you. Everybody all right now before we get started here are the ground rules. This is a kubernetes event, so the code of conduct is in effect. In short, please be excellent to one another. This is also a judgment-free zone. Everybody has to start from somewhere. So please help out your body by having a supporting environment in the channel.

A

Well, we do our best to answer your questions. We do not have access to your cluster, so we will not be doing any live debugging, but we will do our best to get you moving down the next step and help you along the way.

A

Normally we do provide t-shirts. However, the cnc store is being replenished at the moment, but we will give you a shout out and their undying devotion panelists. You are encouraged to expand on answers with your experience and pro tips and audience, please. You can help by pasting urls to docs blogs and anything that might be relevant to the topic at hand in the channel, and you can also post your questions to discuss.kubernetes.io.

A

Oh, we have one more person joining us: hey archie, let's drop you in your chair.

E

Morning, everyone or a.

A

E

In europe I just wake up so.

A

No, it's a pleasure to have you. Thank you for filling our last seat here. Do you want to just say hello and tell us a little bit about you and I will continue from there.

E

Sure I'm archie, I'm working at google cloud, I'm cncf ambassador from canada, organizing meetups for cncf across canada, happy to be here awesome. Thank you.

A

All right, you can also help us by tweeting spreading the word and paying it forward. This panel is made entirely of volunteers if you want to join us for our kubernetes office hours and sit in one of these chairs and help the community by answering questions reach out, we're always happy to have fresh faces. Join us on this show, and also as a new thing we do at the kubernetes officers. Is our community shout out each month.

A

We want to take a brief moment to thank a member of our office hours volunteers for their continued effort and support of this program. This is the barco edition.

C

A

Borko, thank you for all of your contributions. It's always a pleasure to have you here. We really appreciate it all right, let's get started for today. Our first question comes from the kubernetes discuss forums and the question is, I am very new in kubernetes.

A

I want to know why when I remove the command and the command here is ben's sleep 3650d, it's a lot of days. uh The pod won't show as running this is a centos image.

A

There's a nice easy question for us to get started. Who wants to who wants to jump in and explain, enjoy points commands and arguments here.

C

Okay, I'm muted. um Well, if there's no command, there's nothing for the pod to run. I kind of wonder what error message: a person would receive without a command uh that it would get the default entry point for the image perhaps uh which I guess. That would be the reason why nothing would be running. Maybe the whatever's in the image is exiting as soon as it runs.

C

uh You can check the logs to find out what exactly that part is doing.

D

Yeah, I think, looking at the at the pot description, this looks like it's a bit of a confusion about containers in general. As with the container, you usually run a specific long running process and that process as long as it's running your container or your pot will be running.

D

If you just want to start an os like in a vm and work in it. That's okay for like testing. Maybe, and then you need the sleep, but that's usually not how you would use a container. You would try to define directly, which process are you running and try to stick to a single process, because in a pot you can join a few containers for each of your processes. One and you don't need to run it like a vm where you just go in and run a bunch of.

A

Services right, thank you both for that anybody want to add anything before we move on to the next one.

B

Yeah, I I just want to say briefly um for people that are looking for more advanced use cases and building out containers from scratch, building up their docker file or maybe having multiple things running in a container, there's kind of two or three solutions out there that are kind of base image or very thin in it processes.

B

If you want to just get containers that launch and then you can ssh into them, if, depending on your use case or do something, you know maybe different from the norm, um these exist.

B

One of them is the fusion base image docker, which has been around for a really long time and includes some other kind of base things and that's going back to what buchu was saying more of like running a virtual machine right running a almost like operating system that you're just hopping into, and you can do whatever you want, and it's just always there and.

A

Of course, there's dumb.

B

In it and and tiny and a few other uh solutions that are in processes that you can have set as your entry point and kind of, can keep the containers running so there's some options there, depending on what you're trying to do, but.

A

Thank you very much. uh I guess I'll just gonna add one thing that may seem obvious to probably everyone here, but you know: docker docker files use entry points as the command to run by default, where the command is the arguments to the entry point. Unless the entry point doesn't exist, in which case the command is a command and then both of those can be overwritten as well.

A

Kubernetes does not have an entry point. It has a command, an args, which is a lot easier to understand, but just be careful, you don't fall into a trap of confusing either of those in there. You will need to provide a command if one isn't provided by default.

A

So I hope that helps we'll pass on an answer to the disqus forms and hopefully get you on our way.

A

Okay, let's move on to our second question from oh, we have one in a slack channel, so, let's dive over there. First, we have a question from bala hi. Are there any tools that can delete or clean up space used by old images and tags? After I have done a rollout.

A

Anyone get any experience with tools for cleaning up old images and tags from each of the nodes with the recognized cluster.

D

Yeah is, is it from the nodes or is it also the registry cleanup, because the node cleanup, if I remember correctly, there is a garbage collection in the cubelet that, if set up correctly, that should do it for you as long as you uh don't have like more um requirements, there's something in the thread. Some details, yeah.

A

Yeah, I I think it's I I think, because I said ruler, I'm just in my head, I'm assuming there's going to be the node and the garbage collection and the cube plate is a really good point. um Does anyone have any specifics on what that garbage collection looks like and how old the images need to be anyone familiar with that off the top of their heads.

A

No, that's all right. We don't need to know all these things um if.

B

A

The registry a bit of a trickier question, but does anyone want to go for that.

A

E

Well, some registries: I think you can kind of have a policy to expire, some images that are older, some dates, so they will be cleaned up automatically. You probably need to check your registry if it has such feature to do the kind of a garbage collection like a policy to clean up all their images.

A

All right, yeah.

D

um Yeah remember with docker distribution: it used to be quite a manual process. um Harbor might have a flag or a config config option for a different memory.

D

Yeah and I guess from an experience, I've used azure container registry. I think it has a concept of tasks which you can use to then automatically um clean up other images that you may not be using or tags that you may not be using anymore. So that might be something that can work for that specific solution.

A

All right, thank you.

C

I know you said off the top of your head, but I like to cheat when I can and uh the first thing that googled up is garbage collection for container images and it talks about c advisor and how it has uh thresholds for when images would be deleted and then the end of the this url that I'll add in the hack md says that some of these features will be replaced by cubelet eviction in the future.

C

Oh we'll share the link.

A

All right, we will make sure those links are shared on the channel as well. Thank you to sean, who has also been posting links on the thread to the garbage collection. Awesome thanks a lot.

A

Okay, let's jump back over to our hack, md questions and question number two is: does anyone have any experience with kubernetes offline upgrades? Specifically, this user is going from 118 to 119.

A

and they can't find any instructions on the internet. Can we help.

D

I did cheat a bit right now and checked out the way that it's installed on the genesis stocks um and it seems they're, basically just pulling down the the images before so in the download and install kubernetes images you would instead of now in that docs downloading the v1 18 one download the version, for example the latest latest 119, and then the installation or the deployment of kubernetes is done using cube adm here within it, but with an upgrade, I think, what's the correct cube, adm command for an upgrade, I think just q adm upgrade right so there you would then use the same kubernetes version that you just downloaded.

D

The images for um you would most probably also want to update your flannel and ingress controllers to more interest in your versions, but like just for the kubernetes itself, I would just redo the the the the items in that docs around the download and install the keyboard or download install kubernetes images. It's like docker pull, save load and then do the cube. Adm upgrade documentation for cube adm upgrade. I guess we can link that that's I've.

A

Never done that myself.

D

To be honest, we haven't used the upgrade cli.

E

I've done uh previously, like installation upgrades uh for the offline mode like uh when the company's behind firewall, so you can't like pull the images from internet and stuff like that, um that time, uh the the customer that I work, they had a jfrog artifactory and they have a kind of a mirror feature.

E

So on on the registry itself, you will be able to mirror your images from the that is available on other registries, so you can pull it like internally in your organization, if you're behind firewall um and then, if, like, I think the extreme case, if you don't have even that um there is like you know, you can always use docker save docker load command like it's docker save, will create a tar file for your uh image and you can copy it like on your usb stick and then bring it up and then docker load from turbo.

E

It's gonna create your image. Obviously you need to do some research. uh What images um you need to have for your kubernetes installation, like a q, proxy image or, like uh you, know, cubelet image or, like all the like cube api server, docker image like so you'll, have to do research. It's probably a little bit of pain process for you, so maybe look for also some some vendors or some uh distribution of kubernetes. They already have pre-loaded images that can be reused and stuff like that.

E

So I don't want to talk about vendors just because they didn't give us an example. What distribution they're trying to install but vendors usually have offline mode of installation as well.

A

Yeah, I guess if these are qbdm clusters and I'll just assume they are because I think that's where a lot of us are these days that you know as far as the the host goes, the only binaries you'll need are the upgraded cubelet and qbdm itself, and then the images like everyone else has mentioned, with the api server, the controller manager, the scheduler, etc, pull them down load them onto the hosts and uh should be a large chunk of it done so. Best of luck, uh great.

D

A

D

Thing, maybe you need to also take care of, is the the pause container like I've ran I've run into into that issue in china because that's defaulting to gcr and um it's not available.

A

uh The always forgotten pause container, of course yeah I mean don't forget that.

E

All right I haven't used for a while. We used to have a a project in kubernetes community, which is called cube spray, so keep spray was an ansible deployment of cube adm, essentially with a lot of different features. I just checked the repository. I don't know how updated it is. I haven't used it for a while, but like apparently they have offline environment installations. I'll share the link, maybe in the slack channel or in the hack md document.

A

I guess yeah, if you drop it into the hack md document I'll make sure it gets back to.

E

A

On the discuss forums, awesome: okay, let's jump back over to our slack channel. um I see we have a question from long. I reckon we may need some more details there. So I'll read it out and if you can get back to us with some more that would be great, but long asks. What is the what is recommended? If we know this shutdown and pods, don't reschedule anyone feel confident enough? Answering that as is or do we want.

D

A

D

A

D

Of these issues on my last on-call shifts, so I can tell you what I did um so mainly. You need to check why the pots are not rescheduling oftentimes, at least for me, for example, it was that a volume, a persistent volume, was stuck and was not being moved to the new node, um and that was a cloud issue, basically that I had to resolve. It wasn't wasn't very easy to resolve, but it's very different. It can be very different reasons um why the pod is stuck and is not getting rescheduled.

D

If you have a non-auto scaling cluster, that could also be that just there is no space, there's not enough space. Sometimes it will tell you. I don't know if it's a host port service and that host port is not available on any other host, so it will usually tell you why it cannot be rescheduled so on each part, you need to to run the describe command and then see why why they're not being rescheduled.

A

Great advice there.

C

I like to take advantage of auto scalers and and uh add a node before I remove a node um and then off. You know you might end up with the same problem when you try to remove that node, but uh it it's kind of like cycling forward. I'll just add a new note and remove from remove the oldest things uh and definitely check the health of everything uh that you've moved to the new node before you go killing your old node. I've lost some longhorn cluster.

C

That way I thought my longhorn had replicated to all of my new nodes and turns out. I was missing some node binaries that needed to exist on every node and they weren't on these new nodes in my cluster.

C

So just check your health before you remove things.

A

The question that popped in my mind when I read your question along with what do you mean like define a node shutdown for me like? Was it a clean shutdown? Was it drained and cordoned, or did the note disappear? um As the note disappears, those pause won't necessarily be rescheduled until there's the timeout. That's been kind of satisfied, just in case the node magically comes back online.

A

It's a network failure, there's a whole bunch of things there that are quite important, so you may have to you may find you have to delete a few things together to reschedule if that node is not going to come back or wait for the.

D

Timeout I'll I'll add a couple of things. Definitely the the volume uh point I ran into that before the other things to check is like check your note, selectors or paints and colorations like affinity like no dfinity, or anything like that that perhaps the old note had certain uh certain of these properties, and then you know it doesn't so the pods aren't being scheduled as well.

A

Came back with a message.

D

Of that someone was testing, I think and shut down the worker, but nothing got rescheduled.

A

D

Think what david was saying around the eviction time like how much time, because there's also a time like because the the note might come back when as it's expecting, maybe it comes back, maybe it's the network split, so it will wait. I think, by default at least 15 minutes or depending on what have has been set up there.

A

Yeah, it's a lot longer than I think most people think again along came back and said someone tested disaster recovery and just shut down the worker so yeah. That's that's not going to reschedule immediately it's going to take a substantial amount of time unless you nudge it and encourage it along plus.

A

Of course, all the answers we have from the panel here, like you know, are you using local host path, stuff, um other tenants and tolerations there's a whole bunch of things, it's quite a quite a challenge, but hopefully we've given you enough information that you can move that forward.

A

All right. Let's move on to the next slack question this one comes from mustafa. Who is asking? uh Does anyone have any suggestions for gpu partitioning other than a gpu share, scheduler extender container, I'm not sure what the first one is container all container service, maybe and since the lag setup on container d and any recommendations on switching from docker to container d for a bare metal cluster.

A

Why don't we tackle question two and come back to the first one? I feel like that's, maybe a little bit easier. So does anyone have any recommendations for migrating a cluster from the docker cri to container.

A

D- okay, so I don't know I was gonna say I.

B

Don't I don't- I don't really know but, like I haven't done, that conversion um firsthand, but like I've gotta, imagine that you install container d and then you probably tell kubernetes to leverage container d or you tell the keyboard to leverage container d, so this would probably be a cubelet config update after you installed the container d binaries on whatever distribution that you're using that's what my guess would be.

B

um I did bare metal years ago and it's very much like you're, going no to know doing, um upgrading and adding things that you need the dependencies that are needed for cable to use. So that should be about it, though there shouldn't be anything like control plane. Ask that you should really need to do, but I could be completely wrong. So.

A

No, I I think my experience is the same there you you do need to do it not to node you'll, want to drain and coordinate. While you do the upgrade and just reconfigure the cubelet, restart it and then bring it back into action. um You can do that node by node, every node can run a different cri. If you want, I would recommend it, but the upgrade.

E

A

Should be okay, sorry.

E

No just, alternatively, just without uh you know, if it's a small footprint here, we you can just deploy a new cluster. You know with the cri and just do the kind of a blue green shift and push your cd towards a new cluster and then do a cut over. I would say, like once, appropriate, should be fine.

A

Yep great option.

D

Just another note: I guess you do still have to take care of any workloads that may, for some reason, be using the docker socket, so um you may have to migrate those workloads to use something else.

D

Yeah yeah one thing I've seen also is that some people keep the doctors up like like just move kubernetes to container d, but have a docker still on the system. For I don't know their legacy ci cd pipeline, where they need to mount the docker socket or I think in like some versions of the aws csi or a cni plug-in. You need the docker socket still and I've seen some cases where they just keep it on in the u.s and don't completely remove docker. As long as you don't have like big security concerns around it.

D

That should be fine too.

C

I, like the ephemeral cluster approach, where you just move everything to a new cluster, but if you don't want to try that, maybe you could try uh just doing that with nodes again just adding new nodes that are using the new uh container engine and then ditch the old docker ones.

A

Yeah, I guess it depends how bare metal their bare metal cluster is and what flexibility they've got in capacity, but yeah all really really good options there. um What about the gpu partitioning and then does anyone get any advice on that? I don't think I've ever touched, the gpu workload on kubernetes. So I'm I'm really.

D

Missing I've never done it myself, but I I know actually the people or some of the people behind the early on union, um the ali cloud gpu share scheduler that one is actually kind of the one that is used most these days. I've heard there is another one which is a fork of the nvidia kts device, plugin um I'll post the link in a bit. I I found it at some point, um but it has also not been touched for like a long time.

D

I think the last version it was touched was 118 and before that it wasn't even touched for like a few years, but it should work, it might work. There might be other forks there.

D

However, if um the alien one is not working on container d yet then most probably they will be working on that too, because I mean uh kubernetes is deprecating, anything that, like that is still docker shim. So um I I would expect the alibaba people to also move away from that. Otherwise, just I I guess they would be open to having an issue and and talking about that, because the plugin looks good and looks pretty used and widely used as quite a few stars and is is very well maintained. It seems so I would.

D

I would try to get in touch with them.

A

All right, thank you for that. I guess that also shows my gpu experience, because I thought, ally and container service was a table. So thank you for correcting me there today, I learned okay, let's move on with the slack questions. Now, we've got one here from vamber who says hi everyone, I'm currently working for a company that has around 50 000 workloads running on kubernetes recently. The main objective is to improve cpu efficiency.

B

A

We are using methods such as the horizontal pod, auto scaler and asking our software engineers to lower the containers, cpu requests, which are often way too high. I'm sure we've all seen that before.

B

A

We ran into a pretty interesting problem, some of our software engineers actually used cpu set within their codes, which we believe it would be a cause, the corresponding container, to monopolize the cpu, and this is bad for sharing the cpu between pods. Is there a more efficient way to quickly filter out the pods which exhibit the cpu set behavior, I'm not sure what cpu set behavior is. Does anyone here know what that is and want to give us a tldr.

E

That seems like a language specific or like maybe java, or because they seems like doing it in their application code. There and.

B

Yeah, so I just did a quick google search, and it is what I think it is.

A

D

B

Cutting out maybe okay, no.

A

B

A

B

Okay, yeah, I just I didn't- want to overshadow somebody but yeah, I think cpu set and I'm pretty sure I use this back in college briefly, when learning about things is basically a mechanism in the linux kernel to let you set the cpus uh to leverage for a application, um so this is in kubernetes land, not something you want to be using in your code at all um that that's not going to that's not going to work.

B

Well, I think really the thing what you should start doing is saying: how can we get developers to think differently about the environment that their application works in and the other part of this is there is an option. I forget the uh what you pass now, but you can actually have um specific pods take a dedicated, cpu cores um and someone.

B

Someone like me, link in the office hours chat the the documentation to do this, but um maybe that's a short term thing that you're able to do where you actually grant their services dedicated cpu cores, and then the acp you set on those cores.

B

If that's something that might work, um but I think to really get like the the proper multi-tenancy and shared resources, you're looking for um you're gonna have to get developers to stop using cpu set, I'm interested to know how it actually like with the things that are using cpu set, what's actually happening, and if the runtime is actually allowing them to see group configuration actually allowing them to, uh or you know, take a certain tpus or is it just like a facade and they're not actually able to to pin or have affinity to any any actual cores?

B

It just looks like they can, um so the developers aren't getting what they want with that being set and you're not getting what you want. So there's a lot here to kind of to work through.

D

They also mentioned they're, not sure how to find all these containers that are running this cpu set not sure how, because it's it's in the co it's or it's on the kernel. What would something like falco maybe help like today, like I think it was falco you could you could call um you could define a rule that would find all these for you in an audit.

A

Yeah, if cpu is set as like a syscall or something to the kernel, then falco could definitely monitor for that and then you'd be able to extract that information. That way, so that's a great suggestion there. I would check that out. Well, it's also sorry when you go.

E

No, I just want to say that that seems like a 50 000 work, clothes. It seems like a twitter uh level or I don't know like it's a large alleged deployment right. So obviously they need to find some figure out. Some rules around. You know how to use this uh large cluster in the multi-tenant environment. So the definitely policies and like file code would be great, but I I just want to call out that you know kubernetes has the quality of service functionality right.

E

So potentially, if that's important workloads, uh you need to make sure that your cos qs set to be the guaranteed. So you need to make sure that your requests and limits equal to each other and then in terms of like finding the benchmarking and on all of these things, I I don't know like. Maybe you you guys have any experience, but I have recently played with the vpa virtual porado scaler and it has a feature called recommender, so it doesn't really change your um cpu requests and limits.

E

But it's it's recommending you a good values based on the uh history of your container run, so you can actually have a like out of the box benchmark for your containers and then you know you can still use hpa but like use dpa to just recommend you good values for requests and limits and uh like hopefully it helps because on such a large you know number of containers it's at least this process will be like semi-automatic.

E

So you can get all these historical values from your workloads, hopefully have helped to set the correct values for request limits.

D

Yeah and uh obligatory mention of goldilocks in combination with vpa to to run experiments is always um can be very helpful. I've seen uh customers save quite a lot of money and try driving down their requests with goldidocks.

A

For anyone that's not familiar, can you give us the elevator patch for gojoler.

D

I it's been a while, since I've looked into it, but as far as I know it, it runs vpa internally, but it runs like certain experiments. I think it tries to to set a certain saying, try out a bit to get more metrics, and then it gives a recommendation like here. This is what we found would be a perfect setting for your uh for your workout. um Maybe someone else is more up to date on that I haven't looked at it for like a year, almost.

E

Yeah I haven't checked to be honest, like how uh goldilocks work, but I found this very interesting like it, I think, provides a ui additional, so you can have more like uh you can have better visibility in terms of like how how the cpu and memory get uh used, um I think there there is like.

E

Obviously there are some other features like uh if, if discussed, if this uh you know person is running on google cloud, there is a new functionality called gq autopilot that automatically going to size, your workloads and your nodes, based on the you know, workloads you're running so so that is like fully automated mode. You don't really need to worry about your.

E

You know, benchmarking of your cpu and memory, but otherwise look into the vpa. I think it's it's pretty valid option.

A

All right lots of great advice there. Thank you, everyone and thank you.

D

I'll, just maybe quickly.

A

D

One more thing um like uh another thing to maybe consider is, if you can use like satcom profiles to just limit, perhaps um this functionality from all of the workloads, and then you don't like, if that's what you you're trying to do, you don't have to go back and then change identify all the codes and workloads and and change the code. So that might be another option that that might be cleaner way to do it. um Investigate.

A

Great idea awesome one more.

C

uh Resource quota and limit range uh resources can restrict how much cpu can be consumed by uh in that namespace or by uh pods in in the name space.

C

So, even if this user is able to jump to different cpus, they should still be limited to the same uh cpu overall percentage, uh but yeah barco linked to um that set comp profile uh configuration which is at a pod level. uh I do wonder if there's a way to set that at a name, space level.

D

Valid also just mentioned that cpu set is usually used to pin cpus and um to do that within kubernetes. You would need like a specific hpc scheduler. uh He also mentioned one by univer or part of altair right now. They might have something um there, but um I've seen other like hpc scheduling solutions for kubernetes.

D

So if, like your, your users actually need cpus and in hpc settings that's definitely valid most, probably you will find something when you look into like specific schedulers that have something on top. I remember nvidia had some things like that um or not sure who, who else was doing these things.

A

All right, thank you, well, eat. We would love to have you join the panel one month I'm going to reach out to you. You've got a lot of experience there to share, and just because there was an acronym- and I don't like acronyms. So at least one person saying what it means. Hpc is high performance computing for anyone that isn't familiar.

A

Okay, let's see what else we've got. We've got. Oh another burmel question, so uh ashish is asked and bare metal, multi-master h, a cluster setup. Aj proxy is distributing load between all of the masters. How shall traffic be rooted for deployments on multiple nodes?

A

Ingress is working only after using an ip of a specific node and there are over 20 nodes. Does everyone want to run over bare metal load, balancing and ingress good fun? Subject? There.

A

A

uh All right, I guess I'll- have a go at this one, so there are a couple of ways to do a highly available control plane on kubernetes on bare metal. uh You need to either use gratuitous, arp or bgp. I would encourage you to go down the bgp route and advertise the highly available control plan ip from each of your control plane nodes.

A

When you're moving on to workloads running on the cluster, you probably want to go down the same route again. Bgp is a pretty solid option here you can use metal lb for this or cube vip and both which can advertise the addresses of your load. Balancer ips, how you get those ips. It comes down to your setup. I can't really give you a lot of advice there and if you want to provide more details, just feel free to reach out to me on the kubernetes slack and I'm happy to chat about that all right.

A

A

Another bare metal cluster question a lot.

B

A

Kubernetes today, all right uh karim asks hello. I have a bare metal cluster with eight nodes. Three control planes, five workers. If I shut down one worker node, this sts stateful set never recreates its pods on another worker node. If I query the kubernetes api, it shows that the pod is running they've waited more than 30 minutes, the pods never get unknowing or terminating so a little bit familiar to. I think the second question to be tackled today did someone else want to add anything to that. Well, I see long shirts.

A

I don't know if that's in relation to their question.

D

I mean stateful set, definitely leads me to think it's something related to volumes yeah, um so I mean I would definitely dig deeper into that and see. If that's uh that's an issue.

A

All right, yeah, maybe we'll just reach out after this and we'll see if we can help you any further with that and let's just move on for just now long confirmed it's the same question yeah we'll catch up afterwards. Maybe we can get a bit more information.

A

Okay. The next question is: can someone share any resources of running machine learning workloads on kubernetes with raspberry pi's? This is from arijit.

A

Kubernetes machine learning, raspberry pi, is that a classic combination.

E

Raspberry pies, with gpu.

D

But I've seen this like gpu style like home clusters was um there was this this uh orange ubuntu box um back in the days uh like a year or two ago, not sure if those are still um a thing these days nooks are an option uh as they they support gpus, although, like some machine learning workloads, you won't need gpus, so most probably was like more recent pies like I know, pi fours, with eight gigs and enough of them um that should that should be fine.

D

uh I think both uh alex alice and and lucas um kelstrom. They have very good uh kubernetes and raspberry pi cluster um tutorials and then on top, depending on what kind of machine learning you're running kubeflow might make sense. uh However, kubeflow is also not not trivial to install. Let's say I've heard um they're making it easier.

D

These days, though, I think one one three has made it a lot easier, but I'm not sure if, if there is a lot of people having done that, yet we might ask george for the the kubeflow stuff he used to be working recently on those things.

A

Yeah, I think, you're right, I think, with their latest release, they started um providing a helm chart for deploying kubeflow, so it should be a lot easier whether the workloads are run well on a raspberry pi, I'm not entirely convinced, but it doesn't mean that it's not possible and, as was mentioned, you may not need a gpu depending on what kind of machine learning stuff is happening and then there's my leads with some more links for for you in the chat all right.

A

Let's jump back over to our hack md and tackle the next question there. If anyone watching has more questions, please drop them in through the officers channel and we'll do our best to get to them before the end of today's show.

A

um Our next question here is: please consider the below scenario, and there is a cluster with three worker nodes. There's one pod with three I'm going to say deployment, there's one deployment with three replicas and which have access to a pvc, a persistent volume claim suppose, one of the pauses on engine x webster I'm trying to translate. As I read suppose, deployment is an engineering web server and I have manually added a virtual host configuration to the pod on worker one and only worker one.

A

I then exposed my deployment to make it available. Now. That's the context. The question is: do all the volumes I want to see replicas of access to this fail, or only the fail on worker one. It's a tricky question without a lot more context, but want to take a stab at that.

A

D

Let's say using a persistent volume is not needed there and would most probably because most persistent volumes cannot be used in a shared, read mode. I mean, depending on your storage, um the easiest actually and most probably the the best way to to solve. This would be to put the site conf in a config map and then use that in your deployment, because then you get it automatically mounted by kubernetes into each of your parts, and you don't need to worry about, like any uh volume mounting remoting or.

E

Issues, I think the fusion is coming here because in docker we're using volume, but in kubernetes it's maps to the config map that you can mount as a volume as well, so look into the configmap documentation and your configuration will be added to each container and it will be automatically reloadable if you modify this configuration. So I don't think volumes is a good example for that it's, I think volumes are good.

E

If you have a content that is constantly changing like modify it and which is configuration is I don't think that conf file is going to be changing all the time so.

A

E

Config map is the best with it or if you have some secret data in it, maybe a secret, but it sounds like more conflict.

A

Yeah, I couldn't agree more. I think that's that's great advice, okay, part. Two of this question is when a user types domain1.com into their browser, which pod do they get, does kubernetes automatically do load, balancing.

E

uh Where's this question: is it on the it's.

A

Part two of the same question.

A

So they're just curious: if a user types their domain into their browser, do we know which part they're going to get and does kubernetes provide load balancing across the pods.

D

So you get simple load balancing by using a service. So if you, if you're going- and this is service of type load balancer, so you would get load balancing anyway.

B

uh Yeah, it should just be the standard so for people that don't know a service object provides standard, basic load balancing and when we say that what we actually mean is it's a very simple round, robin right, very kind of dumb. It's just like. If you have a list of end points, it's just going each one. Each new request goes to the next one right and just keeps looping. That's the simplest type of load, balancing, there's other options with other inkers controllers and things like that. But the default is like using ip tables.

B

I think ipbs as well provides another like some other options for load balancing, but that's kind of the default. So once things get into the cluster right, so something comes into the ingress controller and the ingers controller is using the service object to contact the service. That's all just round robin so or service mesh can help. You uh do many other different patterns with that as well, which could be useful depending on your application. So yep.

D

So just a quick follow-up, um so yes, kubernetes will perform load balancing there is that last question here: what do three simultaneous users type domain one.com, I'm not sure what what exactly maybe perhaps he's looking toward things like um preserving sessions. uh So there is a, um I believe, like something like external traffic policy um field that you can specify in your service specification.

D

I believe, if I remember correctly, you set it to local um and that will preserve, um uh I believe, like local ip, like the source, client ips, so that that may be related to that question. To look into.

A

Yep there's also a sticky session swag on the service, which will allow you to get rooted to the same pod for future requests um all right and partly, if that has ever shut down one of the workers as the state still accessible via the other nodes. um Sure answer is yes, it should be- and we hope so um I'll- maybe reach out for more comments on that and make sure we tackle that correctly.

A

All right, the the next question. We also got from the discuss forums and it's a murder, mystery question. My pod got deleted, but the container is still running on the node help.

A

No, I know we're not going to be able to give any definitive answers there, but maybe we could talk about. Is that something that is common to delete the pause and still see a container and what would maybe be the cause of something like that? Anyone get any thoughts.

E

A

C

E

Is it still running while we're talking.

C

Yeah, I I wonder how how you determine that the container is still running um if you're, on, if you're on the node and you see that the container exists, it doesn't mean that it's still running, I do wonder how you delete a pod and end up with a container still existing, but not running.

E

One thing I noticed is clusters: 1.13 is probably a little bit outdated needs to upgrade urgently to 117, at least because it's like four releases behind.

E

Yeah, that could be a problem right, we're not supporting this anymore.

A

D

I'm always sorry, sir.

A

D

Wondering uh maybe it's in our package and perhaps they think it is deleted, but it hasn't been and I mean you will get. I guess a feedback saying that you have no permission, but perhaps that's it. Perhaps it's you. You don't have permission to delete them. I I don't know I'm just uh throwing out ideas.

A

Yeah, I I wonder if it's even just that they did delete a pod and it's part of a replica set deployment or something and the container's been recreated as another pod. Maybe it's just something that is troubling that it's hard to tell.

B

A

Maybe the delete hasn't actually finished, maybe maybe there's a finalizer sitting there blocking that.

C

I was going to say finalizer, you beat me to it when.

A

It does always a finalizer, always a thing.

D

I mean if the if the pot really went away completely, but it's still running in darker. It might also there be like an issue of like an old version and a bug that you're hitting, because I would guess, if you're running, kubernetes, 113 you're, most probably also running pretty old, os and and docker versions below that, so they might be just like known issues.

D

You're hitting there so upgrading or moving to to in a cluster that is still like within support, um might might help a lot um like most issues are really just uh like things that we've been hitting in the community for a long time and then, uh hopefully, in most most of the time they have been addressed in your versions.

A

All right: well, we didn't have a lot to work on, but I think we gave a lot of different potentials there. So we hope that really helps. Okay, let's move on to our next question. This is an eks cluster, and so this person is asking uh they want to be able to mount an efs volume to the pod and they want to be able to download certificates to that pod they've mentioned their use case is, I want to connect to an external system, and I need our certificate to do so.

A

I need to be able to mount something from efs set, an environment, variable, download the certificates and then run the pod. How will I do.

A

This so it sounds like they've got a pause, that's running an image, but before that image runs they want to be able to download their certificate and configure it somehow. On the.

A

Container, would any containers be a good fit for this.

C

Yeah, a knit container, a nic container that uh just crawls down the certificate. I was trying to figure out how how the certificate is being used and where it's being used. If you want the certificate to be part of a config map that this pod is dependent on, then maybe you need something external to this pod to do the work.

C

I've wanted, like an rsync service like this, just our sync from a url to a config map and do that on a cron, something like that would be handy here, but I don't know how to make that happen.

A

Yeah, it looks like they're pulling the certificate from jfrog as a jks certificate. I don't know if anyone's familiar with that tool chain or not I'm not something I'm familiar with, so I'm not entirely sure what they're trying to do but yeah. It sounds to me like an in a container to pull down the certificate and stick it into a volume. That's shared across into the main pod would be their best bet, I'm not sure.

D

Yeah and you wouldn't need efs for that either, so you can just have a shared empty volume between the two and yeah then you're, also not depending on which node you're on it's like when it gets rescheduled. You're rerunning, the init container.

D

I guess I'll I mean I, I don't have experience doing something like this with the fs, but I kind of call like I'm not sure where, if you need the efs here, um I guess what I've done previously that some sounds similar to this. Is I've connected to external, like system, to download the certificate as part of my kind of workload startup.

D

What the workflow would do is like for jks you, actually, you would probably will have to convert those certificates to to jks from jfrog, I'm not familiar with jfrog.

D

I doubt it's it's doing it in jks, like format, so you would then convert them as part of your startup to to jks format and then import them into your javas keystore and just store, um and that's what I've done um for some records in a while ago, and I I didn't have to use any sort of um persistent necessarily persistent volumes or things like, such as like elastic file, shift or ews, or anything like that.

D

It was all essentially just connecting to an external service getting the certificate you doing like a quick uh script to convert them to jks and then import them into the certificate store. So I'm guessing if you're, using jks, you're, probably already using something like the key tool to con to import those into your chest, store and key store, that's kind of suggestion. I I would check if you, if you really need to.

E

Complicate things with bfs, this looks like in general, like a typical service specialist case so like um maybe they need to also read up on the service meshes and see how they can use them here, yeah. So it's connecting to services essentially.

D

It does sound, but I'm wondering if this is like a third-party java application. That requires something like no to note um encryption between itself, because that's exactly the scenario like I've run into is that it was a clustered application that has its own like zookeeper and cluster management and all of that and each node had to have like mutual tls, but it wasn't like individual service, so it may not be perfect for for a service mesh, or at least to me, it seemed like serious mesh won't be ideal solution for that.

A

All right, thank you both uh well. It says in chat, can't the tls certificates live inside of a secret. That is another good option, but depending on the rotation velocity, if it's a short lifetime secret that may be slightly more cumbersome or require an additional operator, I would imagine, and so yeah lots of.

C

A

Needed here I think.

C

I I um nothing I wasn't familiar with the efs, I'm still not, but two second google says it's more like an nfs where it offers shared access. So I can also imagine there's race conditions where all of your pods are um kicking up and starting up and looking for the certificate. One of them is going to have it in the container that primes, this tls cert and then uh you're also going to have this renewal problem, where you're going to have some pods trying to read from it while others are trying to update from it.

C

So it sounds messy.

A

All right, I think, we'll move on from this one uh efs nfs managed by amazon, uh but we won't talk about using nfs in production. uh Well, it also throws out the ssm as an option, and I can't remember exactly what ssm stands for. So I'm going to say it's super super secret, secure management or something I know it's the key manager on amazon. I can't remember it's definitely what it is, but that would also be another option for something like this. Does anyone know what it means? It's secrets, management.

D

Right, oh my god. I use it every week. uh Systems management.

A

Systems manager there we go. I prefer super secret manager. Let's go with that. Okay, let's uh move on we've got time for maybe just uh there's one last question. So thank you to everyone that did submit a question. um It's been good fun.

A

Our final question comes from kubernetes discuss and this is a node reboot with flannel cna. This person has a cluster with 12 nodes. Everything is working correctly with flannel as the cni plug-in. The problem happens when they reboot one of the nodes due to timeouts.

A

I guess this is the eviction policy we've covered a couple of times today, but just a timeout no pods are removed and the node reboot is so fast and containers are restarted by docker, but all the containers terminate with sandbox errors and betters and the hackmd for the people on the panel and now the editor seems to be that flannel is looking for, slash, run, slash, flannel, subnet, dot, env and it's complaining.

A

No such file or directory the user says the file is present on the file system, but it's not available immediately when the node boots up. It does take a few seconds and he thinks the problem exists there if they remove the flannel pod on the node, it is recreated and all of the containers work.

A

Anyone experienced this before.

D

So it sounds like they're running into a c9 race, condition there, where the cni takes a while to load the the file into into that folder and already all the all the the containers are restarted because kubernetes didn't see their reboot basically.

D

But I've never had that and I'm not sure how to counter that elegantly.

A

Now is this a problem because the node is being rebooted in an unclean way. The pods still are there within the eviction timeout and the cubelet is being restarted. Then.

D

Yeah, so I guess the problem is that kubernetes doesn't know about the reboot, so you like, if you would, how would you tell it like with a drain, or would you drain it before reboot yeah.

A

I think a dream would work right. I mean it's always best to try and reboot your kubernetes nodes as cleanly as possible. You want to give kubernetes a a fighting chance of getting these workloads back online.

D

Yeah so like doing doing a drain of the node before you do the reboot that that might help if you reboot cleanly, that might even can you automate that, like do it? On a on the signal like on the on the shut down signal, do yeah.

A

Semantic draining.

D

I remember someone built a build built some some tooling around that before.

A

Yeah, I guess core s or reverse.

A

To get shut down events and actually be able to either.

B

A

It until the pods are finished or nudge it to restart things, but I haven't seen anything like this in our production system yet, but doesn't mean it's not possible.

D

I think we implemented our own to some uh with some hooks on aws, for example on premise: you're, most probably doing it yourself, so yeah not sure how they're how they're really rebooting the note how much control there is. But if there is, if there's a some kind of manual control or ultimate control that you have like access to than doing the drain before that. That will help.

C

I also wonder how the um how does run volume is being attached to the pod. If possibly, it's not.

D

um So it's usually in its container of the cni that does it so the cni, the demon set cni that runs an init container, puts the the cni file in that that folder and then um then it works. If I remember that from calico correctly and flannel should be working somewhere again all right, I.

C

Would just check the assumptions hop into the pod and see if it's really there.

A

All right well, there's a link there in kubernetes, discussed to our panelists I'll share in the kubernetes officers. I know we've just slightly went over by a minute, so I'm keen to get you all back to your day. I'm sure you've all got other calls, but thank you so much everyone here for joining us today and bringing your expertise and knowledge and sharing that with everyone and the channel and watching on youtube.

A

Lastly, feel free to hang out in the officers channel we're always happy to take more questions and answer them as we do, and we will be back next month for another episode of kubernetes office hours. Everyone have a wonderful day. Thank you again and I'll speak to you all soon.

A

D

Everybody have a great day thanks.