Kubernetes VMware User Group, 7 Apr 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes UG VMware 20220407

Description

April 7, 2022 meeting of the Kubernetes VMware User group with a discussion of reporting metrics associated with the vSphere CSI storage driver to Prometheus.

A

So welcome to the april 7 meeting of the kubernetes vmware user group on today's agenda. We've got a number of items and if we get through all those items as usual, we'll keep going and attempt to run at this as generic q, a and birds of a feather discussion covering things related to best practices for running kubernetes on vmware infrastructure.

A

The things on the agenda are recent updates in the csi storage driver for vsphere, as well as the cloud provider for running kubernetes on vsphere.

A

Then, after that I added agenda items on a tech preview, that's available for the vmware desktop hypervisor. If anybody is willing to experiment with running kubernetes on that that tech preview supports the apple m1 hardware um and with that said, let's get into it the.

A

Let me post the link to the agenda notes document in the chat. If you'd like to you can add your name to the attendees list in that shared document and.

A

Let me just share my screen to focus on the agenda.

A

Okay, I assume everyone can see that agenda document and hopefully the font is appropriate so that it might actually be readable. So on the topic of this vsphere csi storage driver actually, two months ago in february this came up, so there was hints that yeah and what we're talking about here there were features added. The primary one, I think is exporting metrics to prometheus that relate to success, fail and performance of storage operators operations when you're consuming vsphere storage uh through kubernetes a couple months ago.

A

The stuff might have been in there, but it wasn't documented and recently meaning. As recently as the last month or even the last two weeks, the docs got published and a great blog post came out from cormac hogan, a member of the group- who I don't think is here today, but he has specific instructions to walk through setting this up and by setting it up, I mean even hosting prometheus and grafana to put together a complete system with a gui dashboard.

A

I tried to walk through this myself, but I just started it yesterday and was doing it part-time, so I didn't have didn't get far enough along that. I can actually demo it, but cormac's blog post has screenshots of what this looks like. I will warn you like. I was trying to follow, along with cormax and depending on your kubernetes distribution, whether you use a commercial one, you know an open source, one or a community one or uh just use pure upstream code roll it yourself.

A

Some of the instructions appear to be related to a particular distribution. So when it came to his instructions on identifying where some of these csi components live, he called up using vmware hyphen system, hyphen csi as the name space and on mine it turned out it was. I was using the tonzo community edition that installs the csi components and cube system, but other than that the stuff appeared to work. As I went along um miles and I had a little sidebar chat before the meeting officially opened so miles. Here's my question for you.

A

You know I was curious, since I had run an installer to install kubernetes as to what version of csi actually was there. You know, I think that would be a common scenario where you use a commercial distribution or something with an installer and you're, not really sure what version of csi you're dealing with, and I was curious as to what best method is for determining that. You know I poked around with a google search and said to say maybe there's room for improvement here, because it didn't seem straightforward.

B

No, you know you're absolutely right. There is no easy way to tell particularly in different tanza distros of kubernetes, so, for example, tkgs, which is what cormec was using. I believe for that um or it could have been just vanilla with a straight up install on it, because it was not a pre-packaged csi driver. So when it comes to vsphere with tensor tkg service, that runs a private proprietary fork of the csi driver right. That's why they've got the quota mechanism and all that kind of stuff in there.

B

So that is different and there's no sort of you know. Release number for that. So whatever you get in uh the 10 to kubernetes, release for 1.21 is just what you get. You can't upgrade. It's not supported to upgrade um it's a little different when you look at like tkg, multi-cloud or tkgm, which you install a management cluster and then provision your other clusters from it.

B

That's different because that does use upstream code that does use upstream releases and the best way that you would figure out what you're actually running in there is number one it should say in the release notes and if it doesn't, we should probably get product management to fix that and number two is you can look at the image tags for those pods themselves and figure out.

B

You know because it will have the image tag in the in the the tag for the csi driver itself, so that would probably be the other way um when it comes to running the pure upstream stuff like if you have a vanilla cluster that you built yourself and you installed csi driver yourself. Those images are tagged, v, 2.1.

B

or 2.5.1 or whatever dash, and then the commit shot.

A

Yeah, that makes sense, and I suppose, if you installed it yourself, you'd already know it's just that. uh You know some of these other distributions. Don't really give you the whole list of the bill and materials of what it is. You're getting and well.

B

And I'll say from like google's perspective like if you install google anthos gke on-prem, uh it uses a csi driver, but they too forked the vanilla, csi driver and they do their own builds, which have their own uh different tweaks here and there they change some of the failover properties of certain pods and that kind of thing. So, even if you go and say, look at red hat or if you look at google or whatever red hat are still used in upstream.

B

I've heard that they're going to fork and do their own thing as well, but it looks like what's happening. Is those distros are just saying? Look: here's the package that you get with this release live with it yeah. You know you don't try and upgrade this stuff.

A

It's I perfectly understand that, but you know when it comes to the csi driver, publishing, release notes with new features. I think it's natural for a user to expect that hey, I want to know with you know. My vendor chose this for me, but I'm curious as to what release I'm actually on. In the event I want to enable you know prometheus metrics, and I wish it were better. One thing I thought I might work was to just go. Look at the logs.

A

I had this theory that gee, if I can find uh the pods, where the csi components live and go, get the logs I'll bet that they dumped the version when it started up, but it didn't appear. No it doesn't it did. That might be a nice feature to add. So maybe uh I'll take an action item to to open an issue to suggest that because then, even if somebody was to fork it, I doubt that they'd remove the uh you know the.

C

Choice to submit a version.

A

Number to the log, so that might be a useful service for users.

A

That said, you know this is a recent version so, depending on when you installed your kubernetes, where you got it from, you may or may not have these metrics available, but I would guess that they'd be coming soon. If they're, not there already. um The metrics that, I think are you know boiled to the top is useful. There are these success fail things I think it's a three-way switch that comes out on these various operations, like volume, create delete, attach, etc, so that that would seem something useful to monitor and then latency of these operations.

A

You know, ideally these should be low, but if you go look at cormac's blog he's got a grafana dashboard that shows these, so that strikes me as something really useful to have, and I think there are a few other things in there as well. um There is documentation uh linked here uh on collecting these metrics.

A

To be honest, I have an inkling that there might be a few things in there.

A

You know if you were to look at the source that aren't disclosed in the documentation, or that was my impression as I was poking around um so but hey some documentation is better than none, which is the state it was in a couple months ago, so right for whatever it's worth um this might be. I suspect this would be useful to very many people running this in a production scenario. It's almost you know if the metrics are there uh and useful, it's almost crazy, not to use them. In my mind, uh could save you yeah.

D

Give you early what kind of metrics is it giving providing so.

B

They're, some of them are gauges. Some of them are like histogram type things, so you'll get like op types, so they'll be like what part of the crud operation was it or you know? Is it a volume? Expansion or is this a query? You know snapshot operation, so it'll give you the operation uh whether it passed or failed. That kind of stuff, uh the latency stuff, for example, that steve was talking about, is operation latency, not storage, latency critically, because they might look the same sometimes, but some operations, you know, take 50 milliseconds of process.

B

Some of them take three or four milliseconds to process, but each operation comes with its own latency. So you can monitor the api latency of the cns csi driver as well.

A

Another thing I want to point out this, this picture that I scrolled up to shows kind of the basic architecture, because some might not even be familiar with the terms used by prometheus, but service monitors are the things that are put in place to gather this information uh and they feed into a prometheus server. Then you might put something on top of that, like rafana to actually render these metrics in a a gui so that you can get the dashboard experience.

A

One thing that was pointed out by members of this group back when we covered this two months ago. I think it was miles robert and scott, who I'm not sure has joined us today pointed out explicitly that in some distros there might be a package of prometheus but there's a particular version with an operator that is much easier to deal with than using some of the things like the packaging built into tonzo distributions.

A

So maybe one of you miles- or somebody wants to amplify on that, even though we covered this a couple months ago. I think it was some an important lesson to be learned.

B

So if you install tkg extensions or kinds of extensions, or whatever they're called today on top of your tensor cluster, so that's like a bunch of out of the box uh bits and pieces to help you. You know, get a baseline for running your case cluster. uh It uses k app to install it all and that's fine. It gets you what you need. However, it doesn't include the prometheus operator and the prometheus operator is what gives you access to the service monitor crd type, which you absolutely want to be using to discover all your metrics.

B

So if you are using the tanzo extensions, I and you want to use service monitors, which you really should be using in prod for for prometheus. I would advise not to use the prometheus and grafana that comes with tkg extensions and instead use the helm chart, because it rolls all that stuff out. There's one called uh q. Prometheus stack, I think, is the name of the helm. Chart I'll drop a link into the chat anyway, and that's the one that I would advise. You guys uh take a look at.

A

If, for some reason, you really were attached to using the k app thing, I understand there's a method to set that up, but it's manual using config maps and uh I've heard that it's quite difficult. So if you it's.

B

A

Take the easy path, this this operator method is uh maybe going to save you some time and headaches.

B

I dropped the chart into the uh chat there, so that is the one I encourage you to have a look at the critical piece that it contains is the prometheus operator. You could roll out the prometheus operator yourself to your tkg cluster, along with tkg extensions and probably make it work um to me. It feels like a bit too much work to do something like that, and this is always known, tested released as a package, so I would say uh seriously look into cube prometheus stack.

A

Okay, let me cut and paste that right into the note stock- well I'll, do it after the meaning, since I'm sharing the screen: okay, here's um an advisory related to csi and the upcoming 1.24 release.

A

Now, just this morning I saw an announcement that maybe this kubernetes 1.24 release is going to be delayed, so uh you know that they had plans for a release candidate coming out very soon, like you know, you know this month in a week or so, but uh they announced that release candidate might be pushed back a little bit and then one two four obviously is likely to get delayed too. If the release candidate is late, but in any event, if you do upgrade, there is an advisory on csi.

A

This isn't just vsphere csi, but I believe it's related to csi in general of needing to drain out nodes in an upgrade scenario, and I put the link to the pr here that has more details on that. So you know if you live on the bleeding edge and jump on kubernetes releases as soon as they come out. This is something you might want to keep in mind to not get ugly surprises.

A

Then I'll turn this stage over to you miles that the csi now supports uh csi snapshot, uh natively.

B

Yeah, so this is something that people have asked for for forever. Since we launched the csi driver was csi snapshot support. um I guess just some points of clarity. I mean the high level thing. Is we now support the csi snapshot feature right? It's a boilerplate feature. That's part of csi spec, nothing special there. It does not back up your applications, it does not guarantee application consistency.

B

It does not do any of that magic stuff. You still need to quiesce your app. You still need to make sure that it's consistent before you take your snapshots. These are not backups, um but that functionality is there. If you need it, so um it does a snapshot at the vsphere layer. Then it takes a consistent copy of the data and then it releases the snapshot of vsphere layer. So you have a consistent copy of your data doesn't mean that the application was shut down and quiet properly, but you will have a copy.

A

And it could well be that whatever backup technology you use is prepared to integrate with this functionality in a csi driver, but uh yeah that's outside the scope of what this group gets into. But you know there are solutions like valero, along with rustic, that might do some of these steps, and then there are packaged versions of commercial backup products too. That may or may not uh support this. Yet since it's new, I think that it's possible that some solutions aren't prepared to hook up to this yet.

A

But I think that the roadmap would certainly be that I'd expect him to be to go down that path and take advantage of the stack of snapshot mechanism eventually.

B

So, as far as I understand, because this is just the standard- csi snapshot- part of the csi spec, any backup app that you have today take it valero or castin or whatever it is that you use to do your cloud native backups if they support csi snapshot and that part of the spec, it should just work with this. You know it's. There should be no proprietary integration needed to to support this feature.

B

Like steve said it's brand new there might be bugs it could be broken we'll find out, but it should just work with your those backup tools.

E

It works with cast and portworx and valero by the way I've tried all three in the lab great. Thank you.

B

E

Yeah, I haven't tried it with dell's one, but I tried it with all three of those and it works and does uses valero. So I guess that should work as well.

B

Power power back power, something isn't it power.

E

Protect container or something it's like seven words, but yeah.

E

To say we put a ui around uh valero, but yeah.

A

So miles, do you want to cover the stretch cluster too.

B

uh Yeah so um we've talked about this at length in in other meetings and other forums about, should you do stretch cluster with kubernetes, and my personal opinion is absolutely not, um though we do have a lot of customers that keep asking if asking us for it anyway. So these.

A

Maybe give us a background for dummies who don't know like me who don't know what a stretch cluster is or sure.

B

That's that's a good starting point. Let's do that so a stretch cluster is whenever you have two compute clusters on different sites or different buildings or whatever, and you have a single storage array or what appears as a single storage array. It's you know a single logical storage array but backed by two physical devices and they synchronously replicate to each other. That means every I o operation is completely blocking.

B

So if there's a vm on site a and it writes to the data store on site a it- does not get an acknowledgement back that okay, that was written until it goes to the other side, gets an ack that comes back and then they both act to say that was a successful transaction. So that's synchronous, replication and a stretch. Cluster essentially does that, but across multiple sites.

B

So uh it was a a way for years and years and years to protect applications that had no ability to shard or spread or replicate data themselves, so think about your microsoft, sql server, that's a bad example! It does have that, uh let's say mysql as an example, or you know any of those more traditional type databases they don't have built-in replication or sharding, or the ability to scale out or anything like that.

B

So you would protect them with a stretch cluster, so if your entire site dies no problem, all your data lives on the second site and your workloads come up. uh This poses a lot of problems for kubernetes, because kubernetes was designed to not come back from those kinds of failures. It was essentially designed that if something dies it stays dead. You know it isn't recovered somewhere else, and that does cause some challenges and particularly from like a failure domain architecture, point of view. It is very, very challenging to try and align uh kubernetes failure.

B

Domains so say you have three nodes stretch. Clusters are almost exclusively two sites, so you have a system kubernetes that needs three nodes for quorum and you only have two physical sites.

B

That means one of those sites is going to run two copies, so that means you're in a 50 50 crapshoot every single time of, if there's a failure, it might take down the cluster, it might not take down the cluster and then, whenever it does fail over there's a whole bunch of complexity about well the nodes thought they were over here, neither over here and your applications.

B

You know sort of separation, logic inside kubernetes, be it anti-affinity policies or whatever now no longer applies, because the underlying topology has changed. Based on what kubernetes thought it was. So there are reasons why some companies want to use stretch cluster that is essentially, this is the way we've always done things or the only cluster we have is a stretch cluster. Please tell us, we can just run it there and we'll pin it to one site or something like that, um but it is not an architecture that you should go into thinking.

B

This is the best way to run kubernetes. This is the best way to make everything redundant because it will just cause you headaches and a lot of pain going down the road uh that said, the csi driver now supports it, and this is essentially just a sign off that says. Yes, we will support you if you run it on a stretch cluster, but we're not recommending that you run it on a stretch cluster.

B

This is specifically for those environments where you have essentially no choice but to run on a stretch cluster, because that's all that you have in your organization.

A

So there was no. There was nothing technically that changed that no.

B

It was validation.

D

B

Yeah, it's it's. It works the same way. It's always worked, they did add some extra test cases and I believe, there's like one or two edge cases that they caught. So there might have been a few really niche things that they fixed, but the larger work was validating it figuring out how we support it from a gss perspective, figuring out how engineering supports it, uh it was mainly validation, yeah.

A

And I suppose you know the the one thing to consider then, since it's not recommended, but if you have kind of these legacy style databases running your option would probably be to just run them outside of kubernetes entirely, and they should work perfectly fine. The way they have for a decade or more and you'd be able to take advantage of this uh stretch, cluster that you already invested in and get the high availability that you invested in.

B

Precisely yeah, don't don't add a layer of complexity that you don't need. If you don't need to put your existing database on kubernetes, don't do it if it runs fine in vms and you've got it protected by a stretch cluster and it works. Leave it alone, build your new stuff in case and leave the old stuff where it is until you decide we're going to re-platform into a completely different database and it's going to be a real cloud native database. Sorry, my blog just saw someone outside.

A

Okay, thanks miles for covering that another update, the a number of cloud provider, new versions came out in the last couple of weeks. um You know the latest version, the 1.2.6, but they backfilled some bug fixes, and I don't know- maybe even some feature enhancements on some of the older versions as well.

A

You know we we support those, because sometimes distributions or users are running on something less than the very latest release so that you know it. It can be important to support uh users who are on things going back a few versions.

A

I don't know I'm not going to read you the list here. One thing is uh the helm: chart moved so a helm chart is the recommended way for installing the vsphere cloud provider if you're running the route of pure open source or, if you're, in the position where you actually publish a distribution or installer of kubernetes yourself.

A

So I think it's previously been public published in a chart, location that was declared deprecated a few years ago, but I'll find it in the new, improved location and it fixed a few bugs related to these uu ids being flaky under some scenarios.

D

I got a question on that so like we're in the case that you mentioned where we still have older versions of other things that have blocked us from migrating to the csi. um One of the things we found was that the vsphere api can cause basically the cubelet to not fully start up.

D

um Do you, and uh actually one of the things we've seen, was the uuids changing and then that that preventing that causing some issues, but it looks like maybe that might be some of the fixes here.

A

D

Yeah at the tip of.

A

My tongue, I'm not all that familiar that they linked, I think two or three uh issues or bug reports that have been fixed, so the details should be in there if you've been having this issue yourself. Hopefully it aligns with one of these open issues. Otherwise you should open a new one.

A

Obviously, and we can work with you on getting to the bottom of.

D

Yeah we have an issue open internally with vmware our vmware support on uh the vsphere api, basically needing to get restarted anyways. But when that was happening, we were experiencing impact on the clusters. If cubelet restarted during that time, because it would try to connect the vsphere api, it wasn't available and it would get stuck in a state where it can finish starting up.

B

Is that entry cloud provider pricing.

D

B

Okay, okay, yeah! That's I mean that's, not surprise that wouldn't obviously exist. I.

D

Was going to ask does that change if, if you're on csi.

B

As far as I'm aware, if you're on cpi and david's on the call- so maybe he can confirm here as well, but um if you're using cpi, it's not integrally part of cubelet anymore, so cube should be able to start up without cpi establishing a connection to the vsphere api. Because it's an external component.

E

That's true, I have a cluster that was up when my visa, like that nodes, crashed and came back up and my vcenter was not accessible. It's you know not 100 necessarily functional and cubelet is up and everything is running. You've got other issues around provider, node refs and things like that. You know, but uh the cluster will come back up. The node cubelet is not it's not in the data path anymore. The access to the cloud.

D

Okay right, it was.

B

D

Was a little mysterious because cubelet would be, it would look like it was up and it would start reporting to the api server and then it would just stop it was. It was like it partially got up and then it hit that that spot where it was trying to make that connection and then just stopped, um but it still kind of looked like it was up. Just wasn't ever communicating back to the uh the kubernetes api server. After that point,.

B

Yeah I mean the the the entry pi provider. We know we know has a lot of issues right. It was a prototype thing that we put out years and years ago, and that's that's why we built cpi and csi was to get to get away from a lot of those inherited problems that that the vsphere cloud provider had, um but I would say a lot of those kind of like weird niche, especially there's stuff, to do like disc failures and failures.

B

The mount in particular happens pseudo regularly with the vsphere cloud provider and if you're on cpi csi, I haven't heard of those problems in a very long time, with those components.

A

Yeah and the kubernetes project itself, they keep kicking. This can down the road, but they keep. You know for two years now it's been one year from now we're going to kick the entries out of the tree and that you, you will be forced to migrate. The reality is they, I, I think, are maybe exaggerating the speed at which that's going to happen to encourage people to actually you know.

A

Human nature is that some places will just sit around until the deadline is on a horizon that forces them to move, but it's been going on for several years now. So if I were on the entry, I think it's time to maybe be scared, and you know.

D

We're looking close, I mean.

A

D

To get over, we just still have some some hardware that can't move yeah.

B

Right is it because your hardware is pre 7.0.

D

uh Yeah yeah, so you've got some hardware. That's can't get moved to that. So we've we're looking at I'm like okay.

D

Are we going to separate those off and do something different with those because of what the hardware is, but anyway, so we're getting close to be able to start testing that we could move some of our environment, but then it gets confusing when you're like okay, half our environments like this half, it's like that, so it's almost easier to just wait till you can get everything, but um when it these versions, that's called out in that note- is that the kubernetes version version.

A

uh These align with the version of kubernetes that that cloud provider supports, but I believe they will support older ones than that and not newer ones. So there is some correlation where these semantic versions do track, but it isn't necessarily a one for one. So.

D

I guess my my question is: uh if I'm talking about just the entry provider, I'm getting those updates like that. It would say here if I'm getting to that version of kubernetes.

B

Yeah, if you're on the vsphere, if you're on the integrated vsphere cloud provider the entry one yeah, you obviously only get updates when you do a distro update a cubelet update, because it's part of that bin, so yeah it would be whatever is in those.

E

Okay, yeah the cpi in general, is the major and minor version has to match with the kubernetes version. So any 120 needs to be with a 120 cpi, but it could be any 120 will work with any 120 of the cpi is kind of the general contract that is made um beyond that. 121 same thing, 122 the same thing, but you can't use a 121 with 120..

E

um There is a contract about that. Just in terms of breaking changes doesn't mean it won't work, um but with the changes of go versions and things being vendored in and different kubernetes things being vendored in uh it's. The supported way is to be in the same release, same minor.

D

So that next bolt or the last bullet point you see there, so it says uh vsphere, csi and now has no dependency on cpi.

A

Yeah, what that means is that, if you elect not to even use the vmware written cloud provider, you can still use the vsphere storage uh plugin.

A

I'm not sure what the scenario would be that that would take place, but apparently there is one uh because they've called it out.

B

Yeah, it's um that's an interesting one. It's I'll be honest in internal engineering decision, um the engineering team behind csi wanted to essentially allow the csi to work without any dependency on the cpi, and the only thing that it's ever depended on cpi for was to populate the node uuid, which csi has now moved away from. They use a different method instead of node uuid, to actually do storage alignment with the node that it's in. So there is no tight coupling between the two anymore.

B

If you just want to run the vsphere csi driver, you can do that. There's no external dependencies.

A

Okay moving on then, unless somebody has comments to add about the cloud provider or csi go for it now.

E

So miles you don't need any cpi or it's not tied to vsphere cpi.

B

You don't need the vsphere cpi uh now, as I understand it from a kate's perspective, you either need to choose an in cloud or entry provider as part of cubelet bootstrap or choose external.

B

I don't know if there's like a dummy external that you could use, but you don't need the vsphere cpi to be present anymore to use csi. That's that's the top line, but I don't know what other way you would do that because I haven't tested it.

E

Use the cube, vip cloud provider.

B

Yeah you could yeah that.

E

Would work and.

B

You would have a load balancer too.

E

Exactly lower by one pod, why not.

B

Perfect yep dude.

A

Of course, when it comes to cube vip, you can use both the vsphere and the cube vip cloud provider at the same time, because they don't overlap since the vsphere one doesn't attempt to do load, balancing.

E

Although it does have load balancing capabilities now with uh nsx, it can do nsx load balancing for you. If you have nsxt yeah.

A

E

Have the option.

A

To turn it on, if you want.

E

Right and then you cannot have cube fit, but that's only if you enable that in the vsphere cpi, at which point there's no reason you would want cubevip.

D

I was going to say now: I'm going to have to go, look up cube fib because I haven't ever looked at that.

B

That's not a project of ours, bryson, it's a it's a community one, but it's really quite nice. Think of it as like the spiritual successor to metal lb, but like really maintain done well, it's a really nice piece.

A

And if you do go there, it's a little dated now but bryce and I we actually had dan finneran come and present to the group middle of last year. I think, and it was a great presentation and it is posted up on youtube.

A

um I think dan has had some other, perhaps more recent talks on it in other venues. um I can't remember if it was a kubecon or a cloud rejects or something like that, but uh yeah. I I'm really a fan of it myself and run it in my home lab uh it. I don't know that it would hold up in an enterprise, large scale, enterprise scenario as a load balancer, but certainly for home lab and maybe for edge.

A

The thing I think it is pretty attractive.

C

Actually, I did have a question: what about the zoning stuff in the cpi? Does the csi driver now take over that functionality or.

B

So that's a good question david, so it it has. It's got its own implementations of zones and regions as well. um As far as I understand, it's never actually used the cpi for zones and regions and they've always done their own zones and regions. uh I think it ignored the stuff that the cpi put in there. I might be wrong, but that's just what I recall from the last time I I talked to engineering about it.

C

Yeah so then, well, if someone like kubernetes, like you know, starts to do like you know, node like zone placement and stuff like that, does it take the you know, take the side of like I'm gonna, look at the csi stuff or the seat, or you know if cpi is up and running, you know which one is the one that kind of wins out, because there's only you know, there's only one well, two labels right that actually determine where that happens. So if you have both running, will they conflict or.

B

That's a good question. I don't know the answer to that but, like you said it is a very simple mechanism, so they.

E

Don't conflict with one: they don't conflict with one another. One is for pod scheduling and the other is for persistent volume, scheduling, they're used for different things. The csi's will not deal with pod scheduling onto nodes. The csi will purely do zone topologies for where to create the persistent volume.

E

um That's it's purely zoning for persistent storage, not for the pods. So if you want to have full availability zone, topology aware you're going to need the cpi uh in order to be able to get topology awareness in pod, scheduling on to nodes.

C

Yeah, it seems a little it's interesting because, like if you use there's like a whole other mechanism to do like volume scheduling somewhere, it would make sense that you would want the pod to land in a place where it you could actually access that volume yeah. That's why I brought up the question. I was just interested in.

E

The only way of really solving that without the cpi is, I mean you can set in volumes. If you have the provision before schedule- or I forgot the names but there's provision the disk before creating the pod or schedule the pod and then create it, uh there's different uh settings. You can do with that, so you could probably play with the ordering um yeah so wait for first consumer for first consumer- and you know you could probably play with that and maybe get away without the cpi, um but it seems pretty.

B

Niche to me why you would not want to run the cpi with the csi I mean just because you can doesn't mean you should I mean sure, if you're not doing multi-zone, maybe you could get away with it, but I mean realistically just run the cpi and the csi.

E

I think the only case that I see that is that in the docs you mentioned there uh there's something here about for the adventurous who want to try kubernetes on a vm in arm-based things. um If you try things on like esxi on arm or on laptops running these here, you know, whatever you may be a bit more constrained in resources and the cpi does take quite a lot of resources relatively, um so it's just getting rid of another uh pod from there. So I think for edge cases.

E

There may be a use case for that, but that's not going to be topology aware so probably shouldn't be an issue.

A

Yeah, I think this whole topic of zone awareness and reaction. You know where potentially you've got infrastructure failure, domains as a lower layer to kubernetes failure, domains and we've had chats about this before it's by necessity, complex if you're not going to be opinionated and force people to particular choices which the vmware infrastructure tends not to do.

A

You know you could uh elect to have a data center that has redundancy by rack by aisle by region, and it's all out there if you want to do it, but it makes it hard to put a can solution that just works out of the box without configuration settings.

A

Anyway, um let's let's move on, unless somebody has.

D

I have one more question on the cs5 stuff. uh I read somewhere that there was a minimum hardware version of 15.

D

is. Is that true for every like for all the csi stuff or yeah.

B

It's for all the nodes that are going to use csi so any case node has a minimum requirement. Minimum requirement of v15 I've been asked this a lot in the past. You know: can it be used with a lower hardware version and people say hey? I got it to run with hardware vm hardware 13.. You can get it to work with 13, but there are some nasty bugs and edge cases that are resolved in v15, particularly around failure, failures and disk remounting that you do not want to get involved with.

B

So if you try it out, and you see that it does work on 13, yes, it will work, but once you get into some failure scenarios, it's going to be kind of hairy, so we would definitely say we recommend and we only support v15. So please try and stick with v15.

D

D

Just double checking on that.

B

Yeah, I know it's a bit of a pain in the ass, but.

D

I mean that makes sense because it says for the so the I think the question comes in. It does say that it supports vsphere, 6 7, like u3 or something.

B

Yes, 6.7 u3 is the first one yep.

D

So then you're looking at, like maybe a hardware 14 or something.

B

Right um but there, and because that was the first release, we didn't realize that there are these bugs at the time, in the first release that we discovered after the fact, which is why we retrospectively said v15 is where you need to be. But as far as I know, v 6.7 u3 does support vm hardware 15..

B

It definitely should actually, because if we made that the requirement it should be lined up with the esxi version, I can check real quick.

D

Because I think I we were looking at that and it looked like we couldn't go to 15 because of use for six, seven, huh okay, but yeah. If you could double check on that, if like. If that is supported on six seven, then that would.

B

I just googled it I find my own blog on it, um so it says in my own blog that the requirement is b 13 for 6 7 u3. So let me just check internally, it might have been 6.7 u3 and the first version of the csi. We said 13, because that was you know what we originally tested on and then for the newer stuff.

B

Once we went to 7.0 and revved it a bit, then we decided that 15 was the minimum, that we were going to support from that point on I'll check internally here and I'll. Let you know.

E

The issues may have also been with the new functionality that was added into vsphere 7 like resizing, and things like that may have caused the issues that required 15.

B

As far as I remember, the actual issue was to do with persistent ids for the disks in v13. That doesn't exist as a feature as part of vm hardware and we don't retrospectively port features into vm hardware. So we rev the version and we added the ability to have persistent ids for disks in 15.. So to I, as I've always remembered it as a bug, fix and not a feature thing, but I I might be misre misremembering.

E

uh That disc, enable uuid thing that you have.

B

To set that's, that's the.

E

Yeah one that enable uuid that you have to set in the vmx for every single virtual machine, yeah yeah.

B

Oh there you go uh vm hardware, 315 is from 6.7 u2 perfect. So uh if you're, if you're on youtube, you should be good.

D

Can you oh, is that the.

B

That's the link, steve chucked in the chat there.

D

Where was that in there, so I can just grab it.

D

It's not coming up for me.

B

It's in it's in zoom chat,.

D

I I sorry I'm trying to pull I'm pulling it up. It's taking forever to load. Okay, so.

B

Where did it throw it.

D

It said hardware 15, 6, 7, u2, okay,.

B

So you should be good.

B

And that has told me that I need to go back and update my blog that I find.

A

D

A

I'll move on again somebody a few minutes back mentioned running arm. This is a different form of arm, so there is a new update to or a tech preview tech preview, meaning that it's you know, potentially it's not for production and it's experimental we're looking for feedback from users of the desktop hypervisor for mac fusion, and it does support the recent apple m1 hardware.

A

um So relevance to this group. uh You know we declare that we cover all the vmware hypervisors and some people might be trying to run minicube or some other form of kubernetes on there on a laptop. So I've put a link in the agenda, notes, document and here I'll, cut and paste it to the chat too, since I'm not sharing my screen anymore, but you can go enroll in that tech preview and get a download of that and play around with it.

A

um I don't run a mac myself, but I took a look at it this morning and does look like there might be some bug reports and things there. So you know, like all tech previews. This is for the more adventuresome.

A

Finally, another note- and this might relate to david's comment about zoning because I think we'll get into some coverage here, but the kubecon europe conference is coming up in may I think third week um and this group has a maintainer track session there, so michael gash will be presenting there on the eventing integration with kubernetes and if you'll recall, about a year ago, michael, along with william lamb, gave a presentation to this group about their fling that um uses event-driven programming to monitor things going on in the vmware infrastructure.

A

And then, potentially, you can write apps. That would cause your kubernetes to react to changes there.

A

So I know some of the things that they covered a year ago were related to zoning, where you could potentially have events occur over in your vsphere, either due to failures, eminent failures or even somebody manually reconfiguring something that you would want to react to with your kubernetes deployment installed on top of it, and there was there were mechanisms so that you could use it like that, or you could simply use it, maybe more along the lines of event, notifications being used as troubleshooting tools, early warnings, things like that of things that were going on down at your infrastructure level.

A

That could be combined with events being generated from your kubernetes level and provide a unified way to keep those things together for troubleshooting. So anyway, that session is occurring in about a month. I think it will be good. That said, I'm co-presenting with michael and we don't have the deck done yet so, potentially, if you've got any requests for things to be covered, I can't promise we'll get them in there, but maybe- and uh I would anticipate that- would be an update on this presentation that william and michael did about a year ago.

A

Also that kubecon conference is anticipated to be physical. Who knows, what's going to happen? You know the covet news change seems to change week by week, month by month, but back in the pre-covered days, I would I know we were hoping along with robert, when that kubecon was destined to be in amsterdam to actually have a physical meeting of group members and for now anyway, I'm planning on attending physically.

A

I think david, you might be attending physically as well so, but if we can get a quorum even if it's not an official event, maybe we can have a get-together while we're there.

E

I'll I'll be coming, oh.

C

Yeah I'm trying to go. I'm just still waiting a funding approval, which is very annoying, but oh well,.

A

Yeah same uh same here: okay, well great. Let's make that a tentative plan, then, because uh it's been a while since we've had that face-to-face experience and I'd look forward to it anyway.

A

If the group is four, I think I can get away with buying buying uh beverages and maybe a little food so we'll see what happens.

A

So we've covered our agenda uh that that was the last item on the agenda in the notes document, but we've still got five more minutes. If anybody has anything, they want to bring up either to talk about now or to nominate as a topic for next month.

A

Okay, if nobody's got anything, I've got nothing against ending this meeting. A couple minutes early, so last call.

A

Okay, thanks everybody for attending. If anything comes up late, that you wish, you had brought up, uh go for it in the slack channel and otherwise we'll see you in a month. Maybe some of you will see also at kubecon europe bye.