Kubernetes VMware User Group, 7 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes VMware UG 20210107

Description

January 7, 2021 meeting of the Kubernetes VMware User Group. This meeting hosted a presentation on utilizing GPU resources from Kubernetes hosted applications and services when Kubernetes is hosted on vSphere hosts with GPU hardware available on some or all of the hypervisor nodes.

A

Hi welcome to the january 7th meeting of the kubernetes vmware user group uh on the agenda. Today uh we have listed the the item of um gpu support for kubernetes workloads when kubernetes is hosted by vsphere.

A

My co-host is um hopefully destined to be miles gray co-chair of this group, but he has he's not present. Yet I did communicate with him last night and he said he intended to be here and he was checking his lab, so I'm hoping he shows up in the next few minutes, but right now he isn't here.

A

I have posted some links in the agenda. Notes document uh for that relate to gpu support for uh kubernetes on vsphere.

A

There is a youtube video recording, which is an introduction, there's also some documentation which uh outlines the exact requirements, um I'm not an authority on the subject, but the last time I looked at it, you can't take just any old gpu. There are certain requirements for specifications that they'd have to meet most of the demonstrations. I've seen relate to nvidia gpus.

A

I don't honestly know if the other ones are supported or not.

A

With that said, while we're waiting for miles to show up is, if there's anybody new here attending this for the first time, maybe we can spend the time doing some introductions.

A

Do we have anybody new who wants to introduce themselves.

A

Okay, I'll take silence as no or somebody is having trouble finding their unmute button, but uh that's okay. uh Do we have anybody here who has experience with running gpus on uh vsphere for.

A

A

Okay, how about any prospective users who might want to say a few words on what they would, what their use case is or what they would hope to get out of it or what their concerns are about. Gpus, on kubernetes, on vsphere.

B

um I I actually do um we have a I'm with a consulting firm partner of vmware and we have a customer, that's very interested in machine machine learning and so obviously, gpus are going to play a role in that and I'm really just trying to figure out. You know what this means to them: uh-huh.

A

Yeah, I did just a little bit of uh research just trying to put those links in the agenda notes, and it seems like it's something that is definitely catching on. Certainly, when kubernetes is run in the public clouds, amazon azure uh google cloud, there is already support for using gpus for workloads uh in those clouds, and I think that at least for some of them they can be exposed to kubernetes.

A

You know there there's enabling gpus, which a public cloud provider would have to the first step, would be providing the hardware then kind of the next step might be exposing it into the vm infrastructure they use, and then kubernetes would be an abstraction layer on top of that, and just because you have it in the vm doesn't necessarily mean that it comes through to pod, hosted workloads in kubernetes just automatically and effortlessly.

A

The other issue, when you run into a cloud hosting, is that you don't just like the cpu resources being shared across.

A

Oh, I don't know if you'd call it multiple tenants or multiple workloads. You know in some circumstances the actual compute hardware is not necessarily shared against multiple accounts, but still sharing across multiple apps is another thing that isn't necessarily something you want to dedicate a full hardware gpu to so the bit fusion technology has a technique for taking a large physical gpu and carving it up into smaller virtual instances to create the illusion that workloads get a gpu.

A

Even if, under the covers at some level, uh you know they're getting a fractional resource.

A

um Okay, I just got slacked by miles so he's on his way, so um I think he'll be here in a couple minutes, but anyway, just as a summary, I think that's going on, and I've heard that machine learning is yeah, definitely an app calling for gpus and there's two forms of it: both training training, your machine learning and then executing it later.

A

I've read of a lot of use cases for people doing image. Recognition too, where they've got an application with cameras and trying to use the gpus to filter that large data stream to go, find uh interesting. Deductions of it, whether it's recognizing people license plates uh controlling machinery for iot, so I see miles has now popped on the participants list, so we'll give them a few minutes. I think to get his rig set up.

C

Yeah, sorry folks, I've been on pto, so my notifications have been off so I apologize for being late for this. um I am in a state where I'm good to go, though, so I can just share my screen. If you guys don't have anything ongoing.

A

Yeah we've just been having a little chat of uh possible use cases, but uh go for it.

C

Sure thing, okay, so host has disabled. Okay. There we go just one okay. Can you see my screen? Yes, awesome? Okay, so again, apologies for being late. I completely lost track of time. So uh what we've got here is a tkg cluster. It's a tkg service cluster, so part of e3 with tanzu. It's not important. It could run on any kubernetes cluster. The way this uh demo and the way we wrote this app means that it should be portable across any distribution. So it's not specific to vsphere.

C

What's hanzo or tkg, that's just the way that we have it set up so on here you can see I've got this namespace, which is a vsphere with kubernetes namespace for those of you that aren't familiar with it, and underneath that we've got a tkg cluster called endor and inside our endor cluster. We've got this application, which does counting of flowers and there's a whole story behind this that uh that we made up for uh vmware grove show that we're doing, but the context isn't isn't really that important.

C

In any case, we've got three uh masternodes and four worker nodes here and there's a few things in this environment that you sort of need to be aware of that are actually in use here. So we've got bitfusion installed and if you're not aware, bitfusion is a way that you can ingest gpus into a server virtual machine. So you can see, we've got the fusion 102 and o3 I'll.

C

Show you those in the inventory here in a second and then we've got clients, and you can see that we've got a whole bunch of clients here, because we've just had various iterations of this spun up over time and the reason it keeps them all is because the idea is that you could as an msp bill for this. So you can, you know, allocate a portion of a gpu and you can get charged per gpu hour or something like that. So we've got bitfusion installed.

C

We've got three bit fusion servers, which I'll show you in the host and clusters view so down here you can see, we've got the fusion one o2 and o3 and if I can find it in the ui, I can never find anything now in this new ui.

C

So in edit settings you can see under pci device. We've got a tesla t4 nvidia gpu pass through to this vm, and the same is true with bit fusion o2 and o3. So on bit, fusion, 0102 and o3 we've got the bitfusion server software, which allows it to allocate a portion of a gpu over the network to one of these clients.

C

So if we go back into our bit fusion ui here, what you'll see is at the minute we have- and this is a little bit slow because I'm over vpn here here we go so down on the the bottom right. You can see. We've allocated 0.5 of one gpus on bit fusion 03..

C

So what I'd like to show you next is our application itself, so just move some of this zoom stuff out of the way here. um First of all, all the code is on github. So if you want to actually build this or if you want to run this on your own environment and you have access to gpus and bitfusionbits- which you should as of the expert, if you're a vmware v expert, then you can just use our container images.

C

So we've got this a new hope worker repository, and that is what actually does the number crunching it's what's got the fusion bits slipstreamed into it. We've got the docker files up there, as well as just the the raw python code that we're using to actually run these things.

C

So that's all documented within the repository here and if we go into the bit fusion one there's a few little bits and pieces that you need to add like servers.conf, that's how the container finds the bitfusion servers and then client.yaml, which is uh like a token it's an authentication mechanism between the container and the bitfusion server itself.

C

So that's how we build the docker container, so there's this uh worker container, which actually does the image processing and then we've got. The kubernetes manifests in a separate repository just for cleanliness. Really, um we deployed q prometheus. We've got the whole run book here. So if you want to deploy this, then we've got the tkc yaml in here pod security policies, the namespaces.

C

uh We have a problem internally with uh docker hub rate, limiting because of you know the the new limitations that they put on it recently.

C

So I've just documented how you would uh patch your credentials into each namespace, so you can get around that and then deploy prometheus and the application, and the reason that we deployed prometheus here is because we actually use a prometheus external metric and if you're not familiar with prometheus, it's got a concept of internal metrics which are like kubernetes only metrics like pods and namespaces stuff, like that, it's got custom metrics, which can be made up of those kubernetes objects, but are like abstracted one layer and then there's external metrics, which can come from a completely different outside system, and that's the way that we chose to run it for this one.

C

So there's a whole bunch of manifests in there. That show you how we created those external metrics, and then we use those metrics in something called the kubernetes horizontal pod, auto scaler. So whenever we change the desired state of frames per second or flowers per second for this application to process it'll automatically scale out the app allocate more gpus, you know give it more crunch so that it can get through these things quicker.

C

So we thought this was quite a nice model of deploying the application and having it scale is based on a desired state of performance, rather than just saying: okay, give it two gpus and whatever it does, it does it'll allocate gpus until it meets the uh desired state, which we thought was quite quite clean, and then you deploy the application itself, so we're going to have a look at the manifests and I'm going to open up visual studio code to do this, just better syntax, highlighting and such so under here this is our deploy.yaml and you can see everything is in this deploy.yaml.

C

This is one that we wrote for this application specifically, so we create the namespace. If it's not there, we create a deployment and the deployment's got a dashboard.

C

This one is another deployment and it deploys the actual workers or, as we call them in this case wookies, because we went for a star wars theme. um You can see here that this app is called flower counter worker, and this one is just called flower counter. So this is the dashboard one, and this is the actual thing that does the number crunching the the gpu work. uh The images, as you can see, are based off of these uh github repositories.

C

I'll also note uh that the repos here have a github actions enabled on them. So whenever we do a push, it automatically builds the container and then pushes it to docker hub. So if you clone the repo, I think it clones the actions as well, because it's under this workflows thing so there would be a few bits and pieces in there that you would need to populate out. I think it's there, you go, you need to populate repo and you need to populate your there's something else as well.

C

Anyway, if you have a look through the code and through the there is secret key, um then you'll get it. So that's done in your your repo settings here, so you can have the full ci system or ci setup. The way that we have it too. uh So we use those images we export at port 8080 as metrics for prometheus on each of these workers.

C

So that's how we scrape the amount of work, that's currently being done on every single one of these, and then we fill in some information just so that this can talk back to a custom dashboard. You don't actually need this bit, it's just something that we did for our vmug roadshow.

C

You can omit the dashboard deployment and it'll work. Fine. These two are kind of interesting, actually so partial gpu and these are bitfusion parameters. So if you look through the docker file for this image, there's descriptions of what all of these things do and what we've what we've exposed. So you can see I'm asking for half of a gpu from one gpu or you can change this to two. You could say I would like a half a gpu of two. I don't know why you would do that. It's just an option.

C

That's there um but yeah. So you can ask for a partial amount of a gpu and what you're allocating here with bitfusion is uh the amount of frame buffer. So say you have a gpu is 24 gigs ram. Then you'll get 12 gigs of ram allocated. For that one um say: tensorflow workflow, which is what this is in this case, and then batches is the number of flowers that we're going to put through it.

C

It's just a benchmark: it's not actually uh processing flower images because we don't have uh a bank that big of stuff yet um so we just run a benchmark and we get to do 2000 batches to generate a frames per second. So that is the deployment for the thing that actually does the tensorflow work and the gpu work, and then we have two services, one for the dashboard. Again, you don't really need this.

C

This is just something we have for the roadshow, but what you do need is this service for the metrics, so this takes the metrics on each port and then exposes them as a service internally to the cluster all right. Actually, this is a service type load balancer, so we could debug it externally, but this is what prometheus then scrapes. So it'll look at and we'll look at this in prometheus itself.

C

It'll, look for um a label of flower worker counter on a service, find all pods behind that and then scrape those for metrics, and you can see that's your prometheus service monitors there. So we've got two: we've got our dashboard fps and our worker fps, our worker fps, is the one that that we actually use in this. This is just a remnant of us doing some testing here that we haven't cleaned up.

C

So this takes the fps from each of these instances of one of these containers. It's doing gpu, processing and scrapes it every five seconds. So you can see. The interval is five seconds and the target port is 8080, which is the metrics port that we mentioned further up, and this is the thing that actually does the magic here. So this is a horizontal pod, auto scaler.

C

It is using auto scaling api version, v2 beta 2.. You need that to use this new syntax for external metrics and what you'll see here is. We've set a max replicas of four and a min replicas of one. uh It's going to target the deployment, a new hope wiki, which is the worker that actually does the tensorflow stuff, and I've also changed. Some of these advanced overrides just to make it easier to demo um whenever we were doing the road show by default.

C

It's got a scaled-down timeout of 300 seconds, so it waits for the metric to change for 300 seconds before it scales down. So we just overrode that to five. So as soon as we updated it, you could see stuff happening.

C

The interesting piece here, though, is the actual metric itself. So you can see. The metric is type object, it's nice and generic, because this is how we like to do things in kubernetes land, and we say the metric is going to be called flowers per second total. The uh object is in the name, space flower market and the value the target value is 500 and for whatever reason- and I have not been able to figure this out- there is no way to have a horizontal pod autoscaler make an inverse relationship to a metric.

C

It is always 100, positive and linear. I cannot figure it out. So basically, if we say the target metric is a thousand frames per second, for example, kubernetes will look at that and look at the current metric and say well, that's lower, so that's good, so there's no way that we can actually say you know. We would like this metric to uh target a higher value rather than a lower value.

C

It's always striving for lower value, so you'd probably be able to do some uh math thing in prometheus and come up with a nice statement that inverts everything, but just know that at the minute, if I say 500, it's going to give me one replica and then, if I reduce this and say 10, which you'll see now- and I say, 10, it's going to say- oh, I haven't met that. So it's going to start scaling it up. Okay, so that's the dry stuff out of the way. What does it actually look like?

C

So this is the application itself. The scaling isn't playing too nice here at the minute you can see these are actually um full numbers here. It's just. I presented this on a larger screen last time. um So let me just get this. Like this, so you can see what's going on so at the minute you can see, we've got one wookie, which is one worker. It's doing 114 uh flowers per wookie, because we've got one wookie and it's currently um processing 114 flowers in total. So that makes sense right.

C

There we've got one lucky 114 flowers per second, so that means we're doing 114 per wookie. Here's the history, so you can see the number of flowers per second per wookie over time and the number of wookies in total. So let's have a look at the kubernetes and the things here. So if I go into my item, hopefully you can still see that I've just gone full screen. Is that still there.

C

Yeah, it's still there, okay cool, so we're in our name space. So if I do cube cto qctl get pods, you can tell. I haven't been back to work yet qc pods we've got to we've got our dashboard and we've got our wookie. So if I do a cube, ctl get all what we should see is horizontal pod, auto scaler at the bottom here, which is what we just looked at and you can see it says 114, which is the number of flowers per second that we're currently processing so that matches our metric.

C

So we're saying we're targeting 500 and it's got 114 min pods is one max. Pods is four replicas one okay. So what we're going to do is change this target from 500 to 10.. So remember, I said: there's an inverse relationship here for whatever reason, but it is what it is. I'm going to change this to 10 and it's going to start spinning up new wookie notes. So you'll see these start to increase so at the minute status, running 101.

C

So let's go into the directory, which is this one and let me just make sure I've actually saved that change. So there we go 10 and we'll do cube, ctl apply manifests and what we should see is everything is unchanged except the horizontal, auto scaler, which it has. You can see that's configured so now, if we do cube, ctl get hpa, dash w, which is get horizontal, pod, auto scalers, and we put a watch on it.

C

What you should see now is it's going to realize that this metric 10 you'll notice that the targets change from 500 to 10 is now outside of its range, and you can see the replicas have just gone from one to four. So if I do qctl get po w, what we should see is there are indeed more workers being spun up. So if we go into our dashboard, you can now see that there are four workers.

C

The average is 29 flowers per worker and we've got 114 flowers still now you might be wondering why we've still got the same amount and there's a good reason for this. The tensorflow benchmark takes about 30 seconds to spin up in each one of these containers.

C

So what you'll see is there's a latency period here where it starts to ramp up before it actually starts processing the benchmark. What we can look at in the meantime is if we go into vsphere and bitfusion you'll, see now that we've allocated half of this gpu half of this gpu and all of this gpu, because we've spun up more clients. Likewise, if I go into my clients here and again, this is over vpn, so the ui is a little bit sluggish there. It is, and we sort by allocated you can see.

C

Indeed, I've got four and they're running half a gpu each as well. You can see the history of each one of these, so you can see this one spinning up and if we go back in here there we go indeed, so it started actually processing data. So we're now aggregating at 364 flowers across the four wookies and you'll see that in the history here once uh there, it is once it catches up.

C

It says we're doing 124 on 111 on another zero on another and 125 on the final one. So we've set this desired state where we said we would like you to achieve a certain number of flowers per second granted. You know it's an inverse relationship, but we'll figure that out, but it has then allocated more and more worker nodes until they can achieve that, and also dynamically attached real physical gpus to each of those containers and, like I said this will run on any distro. This is not tkg specific.

C

This will run on open shift run on vanilla kubernetes. All you need to do is update your docker file to include the bitfusion bits like this, so this is all up on github. If you want to have a look at this stuff, you can build this out yourself on whatever version of kubernetes you're running. So anyone got any questions on that or anything they want me to poke or have a look.

D

At uh can I ask a question specific to bitfusion absolutely go ahead. How does bitfusion compare to mig.

C

How does it compare to your wig yeah? um You are outside of my area of expertise there. I thought it was going to be more bit fusion specific specific. As far as I understand it, mig has some changes to the networking layer as well, and it requires some specific bits to be installed on the host. I think it uses some kind of vgb transport or it uses um gpu over rdma.

C

Maybe I could be totally off base there, but I know there was some considerations around mig that made it more quote-unquote performant than um bitfusion, but for ml type, batched workloads where you're you know dispatching a batch to a gpu, letting it do a process. It can give you a result back that isn't as relevant. uh It's more for those real-time type, visual workloads that you would see a benefit there.

D

Okay yeah, I was gonna, be my next question, because uh a real-time inference with uh a frame time return limit is one of our concerns right, okay, yeah, and that would certainly be something to have a look at there would.

C

Be a performance difference.

C

Cool thanks no worries. Anyone else got any questions on this.

A

I'm just curious about what the impact might be if you've got some of these workloads running and a v motion occurs just because I'm familiar with the history of potentially some issues with storage volume mounts and just wondering, if perhaps these uh fractional gpu attachments to the workloads might be impacted by vmotions.

C

ah Right: okay! Yes, so let me clarify that's a good point steve! I didn't quite make clear. So if we go into our hosts and clusters view here- and we have a look at our workers, what you'll see is that none of these workers, if I go into their edit settings here, actually have if it lets me, I actually have a pci device attached. So the way bitfusion works is it doesn't actually do a pci attach. So the pci attachments are still on these bit fusion vms. They they don't change.

C

uh What the containers get is essentially an ethernet just an ip address path to a gpu that fusion does some magic encapsulation of the cuda api calls transfers them over the network to the bitfusion server and the bitfusion server then executes those commands locally. So it's not that we are mounting gpus uh directly into vms here, because that leads to its own. uh You know: complications around drivers that are installed what os you've got installed. All that kind of stuff like uh nvidia, is very uh ubuntu centric with everything so distro would matter.

C

uh Bitfusion is entirely ethernet based. So if you do have one of your workers vmotion around or if you have uh even your bitfusion servers vmotion around, uh you won't notice a difference. There is no pci attachment, there's no kubernetes understanding that there is a gpu allocated here. The gpu is connected straight to the container and inside of itself. The calls are just being transferred over the network to the fusion server.

A

Okay, that sounds like a great architecture, then for resiliency, where it should be pretty immune to things going on under the covers. As long as you don't drop network connectivity and right, there's plenty of techniques for investing in a lot of redundancy there to keep your network connections live.

C

And- and that's what I really liked about the fusion architecture and why I chose it for this application was uh because it is so dynamic because I don't need to allocate so say. For example, if you look at nvidia's offering today, um you would have to add a dedicated gpu node to your kubernetes cluster and that cannot move from that kubernetes cluster. So you have that gpu always pinned to that case cluster.

C

However, with bit fusion, because the server's centralized and the clients are decentralized, you can have it so that those clients don't have to come from the same cluster like I could have 10 20 different kubernetes clusters, all talking to the same three-bit fusion servers and getting their slices allocated to them without having to build dedicated nodes uh that are gpu nodes into my case clusters.

A

Okay, I'm just wondering I know you already said: you're, not an authority on mig, but I believe that's an alternate way for attaching gpus to uh hypervisor nodes, and is it the same network attached or do you even know.

C

I couldn't tell you steve, I actually do not know.

D

It's well I'm not going to answer the question specifically, but it's locally attached to the host and it's uh it is a mib that gets installed to be able to do that. So.

C

Okay: okay, okay and like g v, gpu jordan,.

D

Exactly it's: it's basically the the new version of vgpu uh and it does require ampere uh architecture. Okay, okay,.

C

Oh, I think I know what the difference with mig is: isn't it that, uh previous to ampere, the only way they could do fractional gpus via vgpu was frame buffer, whereas now with mig, they do frame buffer and compute isolation? So you get dedicated cuda cores and uh vrm.

D

Yeah, you can carve up the gpu. However, you want specifically with uh timeshare and resource and everything, whereas with v2p4 it was. It was purely fractional.

A

Right so another question miles just for people who, as a learning exercise, might be trying to put together a lab or something to play around with this. um I'm sure it's in the docs. But what are the hardware requirements for this related to the gpus? Does it work with kind of any modern thing out there or are there things to look out for if you.

C

A

To go: do a procurement of a hardware resource.

C

Well, if it were for a home lab, I mean it would be ideal if it would work with like a geforce gtx series gpu, which I'm sure you know plenty of us here- have in our gaming rigs and stuff like that. But sadly, that's not the case.

C

The bitfusion only works with um quote-unquote workstation or enterprise gpus as uh nvidia views them, and it's a legal thing, um so you would have to have some kind of enterprise or workstation gpu that supports the vgpu standard, because that's what bitfusion does to mount these gpus into the server. So some kind of gpu like that, you could probably pick something up like a tesla t4 on ebay. You know relatively cheaply.

C

It's not going to be cheap, but it'd be very cheap compared to like the new ampere stuff or or whatever so some kind of gpu that supports vgpu, tesla t4 is probably a good place to start. There are lower models that you could probably pick up in the ebay. That would do the same thing aside from that.

C

No other hardware requirements, really you just need to be able to fit it into your host and then the bit fusion stuff is just an ova that you deploy you mount the uh pci device into and then the bit fusion bits as you've seen are just something that you slip stream into your docker container. So it's it's really just that gpu, but sadly it can't be a consumer-grade gaming, gpu.

A

Okay and what about the vsphere version? Is this like vsphere 7? Only or does it also work in 6.7, u3.

C

You know, I do not know the compatibility matrix for bit fusion, um so I I can tell you what this is running, so this is bitfusion2.5, which is the latest, and this is 7.0 u1, which is the latest on pacho 2, which is the one that was released sometime in december like december 18th. So it's the very latest vsphere and the very latest of uh bitfusion uh in a lab. Probably not a problem, um though. uh Obviously because this is running d sphere with tanzoo.

C

The minimum would be 7.0 here but, like I said you know any kate's version, I don't see why bitfusion wouldn't work on 67, u3 or maybe even u2, because it is just bgpu at the end of the day. But you would need to check the compatibility matrixes for bitfusion directly.

C

Okay, thanks has anyone else got anything that they want to talk about or any questions or anything they want to look at in this environment.

A

I'm just curious as to what that app was. You know, based on the name of flowers. Was this looking at images and recognizing flowers inside the images or yeah.

C

So it's actually kind of a stupid story that we made up for our v mugs. um The idea is because I was presenting with two dutch guys uh that there is a flower market on endor and uh luke and leia work there and luke is a data. Scientist and leia is an infrastructure person and they have to come together and figure out how they're going to make this new kubernetes style application that uh uses image inferencing to count the number of flowers that are going through this flower market at any one time.

C

So it's just kind of a dumb story behind it, but this is this is what we built. uh The truth is it's. It's just running a tensorflow benchmark in the background, but uh you know it just makes it a little more interesting. If you add a bit of a story to it, yeah.

A

And then some people on this uh in this meeting might not be familiar with you. You dropped that this was done for a vmug roadshow and they might not even know what vmugs are, but those are like a different form of user group, that's associated with um vmware for a long time and they have sort of the equivalent of local and regional meetups going on.

A

Is this slated for any upcoming vmugs that people on the call might be able to attend going forward, or was this done for things that are already over.

C

um So this was done for the dutch v mug um or they call it the nlv mug, and that was just before christmas. It was all done in english and there were three sessions by a guy called johann and niels and myself and it's a three-part series and we set the stage and all builds up to the demo. um So that was done before christmas. But we will be doing this at other ones in the future.

C

We're just getting bookings and stuff sorted out at the minute, but there will be other uh vmware user groups that will be present in the the whole story. Behind this app.

A

Wow, okay! Well, it might be a good idea when those get lined up to drop a link to those things into the slack channel in case anybody wants to see a kind of a more built out version of this thanks here and thanks for coming back from pto. To give this, oh.

C

No problem again, sorry for being late. I just it completely uh slipped my mind and, like I said you know no, no notifications, because I'm on pto so yeah, I'm just glad that I could actually show it.

A

Well last call for questions for miles and then, if just jump in, if you've got them, if there aren't any, uh I didn't there was nothing else on the agenda. But one thing I'd like to bring up here since we've got all these people on the call, are any suggestions or asks for content for the next meeting coming up in february, particularly if we've I know, we've got a few users on the call.

A

What are you curious about or what you want to see just throw it out there and I'll try to go recruit expert speakers on the topic.

A

Okay, I'm not just going to let this go in silence, so I'll assume people needed more time to think about it than me just throwing it out there, but please if you've got ideas or wishes for content for the next meeting, just write them in the agenda or go on the kubernetes slack channel and drop your request there.

A

It's a lot easier speaking for both myself and miles. If we get some hints as to what users want to see, rather than making the stuff up ourselves and also if anybody wants to volunteer to speak on uh subjects like use cases, user experience um we'd, welcome that content.

D

So I'd be curious to see something around uh with tenzu. I know that there's the new storage class pass-throughs for vsan, where you can basically say that this enterprise, app or whatever, doesn't need vsan replication, because it's already replicated intelligently within itself.

D

When we had talked about that sometime last year, there was a question around making that a feature flag for the storage class, so that anything would work because, like something like etcd doesn't have an enterprise partner to sell it through right. So um I'd be really interested to know if that is still being considered.

D

Since that's I think, highly necessary for kubernetes is to be able to do a feature flag on a storage class for that uh if it is or if it isn't, also how that feature works and the performance of it and and what not right, because there's no reason to replicate a piece of data nine times um for something like that. Cd.

A

Okay, so this uh ask, I think what I might do is find somebody who actually has worked on that storage, but is it to expose the capabilities of the underlying storage implementation up to the kubernetes level, so that things can see it is that what you're asking.

D

For yeah, because, like the example that we had gotten was uh say right, which does have like a seller type thing. So if, if you deployed something with on tenzu, that vsan would be intelligent enough to say this is it's already replicated and charted? I don't need to also replicate that.

A

Okay, so it goes the other direction where the kubernetes workload would indicate what its capabilities are and expose that down to the storage.

D

And then it makes intelligent selections there do I replicate. Do I not that kind of thing and the the conversation originally was with enterprise, but um in the conversations everybody seemed to agree that it made sense to be able to just expose that down for any application. Given how many you know free open source projects are that support that type of functionality. So I'm I'm curious. If that was followed through on and regardless, I would actually be interested to see that implementation and how it works in the performance. Okay,.

A

I'll try to line something up miles might have a few comments on it if he hasn't dropped off yet.

C

No, I do actually yeah so um jordan. We had a presentation on that, a while back from gopala and that's, I believe, you're talking about the data persistence platform. um The sort of the problem with those kinds of presentations is they're, very vmware feature-centric and not kubernetes centric.

C

You know they're proprietary technologies and they only apply if you're running on top of vsphere, with tanzu in particular, um so it's not broadly applicable to everyone else uh on the call- and you know this being a community uh meeting- uh we're sort of foreboding to uh talk about that kind of stuff. um That said, there are a bunch of talks out there on the data persistence platform, the vsan data persistence platform.

C

um I can throw some into the uh the agenda that you can have a look at both by myself and the product managers and the engineers that actually built it. um On your other question, with regard to the storage classes, um I do know that there's a cap that is open upstream and it is there to expose storage features um up to kubernetes or you know, vice versa, down, so that the application can request.

C

You know I don't need this replicated underneath and if the storage understands that call, then it'll make sure it doesn't get replicated at the minute. The only way that we do, that is through data persistence platform. That is the way that we've achieved that we have some certified partners, like you mentioned that actually do that today, and you know we have had this question ourselves of. There are plenty of open source offerings out there that do not have a commercial partner, for example, postgres, uh which we use at length inside vmware.

C

It doesn't have a commercial uh partner. So how would we offer that on top of the platform? So some of that remains uh unanswered. You know just from from ourselves at vmware, um but the cap itself is in progress, so there may be something coming to upstream kubernetes that actually exposes those flags uh that would make it a more generic thing rather than just partner uh enterprise uh certified stuff.

D

Yeah very cool that that's the the piece I think I'm the most interested in is that getting into upstream. So that's good.

A

Yeah and just I know we get some newcomers here so for acronym death definition. Cap is kubernetes uh enhancement proposal and you can search the kubernetes github to go, find those.

A

I guess the way that you would do it now if you had to just live with this issue, is define storage classes and associate the apps that have their own implementations of resiliency to different storage classes that have different underlying storage, and that should be workable right.

C

um It should um the problem comes with, you know, scheduling it. What note does it get placed on and then, if it is full storage, you know it has to be co-located with the local uh compute, which is not complex, but you know it's not a transparent operation today you could do it with vvols pretty easy, because you just turn replication off or ask for it to be placed on something that is not replicated with vsan.

C

It's a little bit more complex, mainly not because you can't provision it, it will provision, you know a faults to tolerate of zero object and it will let the client access that storage, but you'll have non-linear failures. So, for example, if your storage is on one node and your computer's on another node and it's a you know, zero copy thing and you lose one or the other node you'll lose your workload and it is quite complex to try and figure that out.

C

So we don't support that today, simply because the the way that failures happen is just a real mess when you start to look at it like that, it's easier to pay the the storage penalty than have to deal with the outages. That would inevitably uh come from something like that and that's why we we built dpp uh the data persistence platform to try and deal with that stuff, um but, like, like, I say, jordan, uh that's good feedback I'll.

C

Take that back and talk to the pms and see if they're thinking about releasing those kinds of scheduling and policy, our parts, just as native easier, not specifically tied to that product cooperation.

A

Yeah and thanks jordan for the input on uh future content, anybody else got anything.

A

Okay: well, um if we don't have anything else, we'll close this meeting a few minutes early, then thanks for coming and the uh next meeting as usual will be the first thursday of february. Bye. Everybody thanks.

A