Red Hat OpenShift OpenShift Commons | Red Hat Livestreaming, 7 Aug 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: OCB: GPUs on OpenShift with Zvonko Kaiser (Red Hat)

Description

The PSAP (Performance Sensitive Application Platform) team has developed the special resource operator which is a template to enable hardware accelerators on OpenShift. Besides NVIDIA we have and are still enabling other vendors as well. For this installment, we are going to talk about SRO and its inner workings. We will conclude the talk with a demo and how SRO relates to the official NVIDIA GPU operator.

A

All right, everybody welcome to another session of openshift commons briefing. My name is beverly kodak, and this week we have von koca from redhat joining us to discuss a template.

A

um The performance, sensitive application platform team here at red hat, has developed to ensure that openshift runs any kind of high performance workloads efficiently and in this session, wonka will focus on aiml workloads.

A

So without further ado, I let funko introduce himself talk about the topic, maybe a demo, and if you have any questions, we'll have a live q a session at the end so feel free to add your questions in the chat wherever you're watching this um live stream from and we'll relay them back here. So with that won't go. Take it away.

B

Thank you. Barely as beverly already said, my name is wonka. Kaiser, I'm working with the openshift pc team, which is responsible for enabling half accelerators in openshift.

B

We've been working with gpus the last several years and closely with nvidia to enable gpus and make it a first class citizen in hope. Shift.

B

This session will cover some of historical bits and pieces we encountered during the way on enabling gpus first on bare metal, then on op shift, and we will explain what we've done and um what s 0 is and how we used sro to enable not only gpus but also other accelerator cards in hope shift. So with all the further odo we are going to start uh what we'll be discussing today. uh We will cover a lot of topics.

B

um This is just the except of from from the complete presentation we're talking about container engines, bootstrapping clusters, auto scaling, metrics, accelerators workloads, building blocks, to show you step by step how an accelerator is enabled first on by metal and another shift.

B

So, let's start with the first topic, which is runtime hooks, so we have on the one side. If you run want to run a container, you have an engine and a runtime, and especially for gpus and other accelerators. We need a functionality which is a which is called a hook.

B

A hook is nothing else in a callback uh during the container lifetime we have a preset hook, poststar token poster post up hook and nvidia is using mainly the prestathook to enable the gpu hooks can be used to enhance the functionality of a container runtime with the functionality that the container runtime does not have, and in this case it's to enable a gpu in a container and what the nvidia preset hook is doing is mostly, it bind mounts devices, binaries and libraries into the container from my host this way, it makes sure that uh the host and the container are in sync uh regarding the versions running on the host and on in the container, so the pre-stato configures the container to use gpus.

B

So speaking, of mounting files in a container. Since we are running on a rail platform, we need to cover another really important topic, and this is s linux.

B

Everybody who mounted any volumes or mounted any files from a host into the container knows that something is happening with s linux. In regards so I don't want to illustrate what really happens uh when you're doing mounts from the host into a container and that's a crucial important point for the nvidia priesthood, because it's getting the files from the host into the container sllinx is nothing else than a labeling system. So a host we have here all the blue labels is the host context.

B

All the red labels are the container context and container and hosts are two distinct acid linux domains.

B

The blue labels or the blue colors know how to talk to other blue colors and how to talk to processes and how to work with with files. On the other side, all the ready stuff is the container is linux contacts all the red stuff knows how to read red labels, how to read red files and how to work with uh red processes.

B

So what you're, essentially doing, if you're mounting uh files from the host into the container, is that you're introducing blue labels into a red domain? The red labels do not know how to handle those blue labels and that's why you're getting always a permission denied if you're not dealing with s linux on on a rail system.

B

So and one thing what people are doing is uh running the container privileged, but they don't know really know what hap what's happening if they're running the container privileged the crooks here is a privileged container has now that you're, seeing all the container and hosts are blue, uh that the container now can do everything with the files and processes as well on the host um containers running privileged are having a special s. Linux type, which is the spc underscore t and sp ct, can interact with any labels from the container and the host.

B

So this is a bad thing to do to run a container privileged.

B

What then people are doing is relabeling host files, but this means you are now introducing red labels into the blue domain, which can break the host content, and this is one worry of nvidia if we are changing the devices on the host, that it breaks things on the host and we have enabled the container. But now the host is broken.

B

So the solution for this is write a linux policy to introduce new types. What nvidia has done uh for the devices, libraries and and other files? That's mounted in the container and write a selling policy to allow the container to read these special files or special entities from the host. This is illustrated by the purple labels. It can be read by a host and it can be read by the container and the container has to not run any more privileged.

B

So this is the part on the bare metal how to enable gpus in a container on a bare metal host.

B

I've posted here the s linux policy, we've written for nvidia, to enable to enable that gpu in the container, uh an updated blog post about how to enable nvidia gpus in containers of bimethyl on on rail8, with all the updated sl linux policy, for route 7 and regulate, and for people who are curious. How a simple pre-start hook works. I've also included here directory for a simple.

B

I call it oci decorator, it's a preset hook, uh which is backed by a config file to introduce which can be used to mount devices, files and other libraries from the host into a container.

B

So now that we know that how it works on a bare metal, we need to think how to enable it on openshift, and one really important topic is bootstrapping heterogeneity.

B

So if we are taking a default install with free master nodes and free cpu nodes, uh we can use a machine set to scale a cluster with your gpu node. So just you can just use a cpu machine set that you have already in your cluster installed change the instance type scale it up and you have a gpu node.

B

The problem. We have here is we have now a heterogeneous cluster. From the openshift point of view. We have three workers. Openshift does not know that we have a gpu node. It knows that we have some cpu, it has only the notion of a worker. So what we've done in this case is um yeah, let's first zoom into the gpu node, uh to show some features that you might have on a node.

B

So, for example, a cpu may have avx 512 cpu flex. Instructions has a specific kernel and is running a specific operating system. On the other side, this node could have a gpu and fpga on it. So every node could have features that are interesting uh to different parts we can expose.

B

We can use those features, for example, for optimized workloads. uh If you are optimizing, your workload for avix 512, you want to have your workload run on an abx, 5112 node. If you have gpu workloads, which is special in this case, you want to run your uh gpu stack only on gpu nodes, you don't want to create a theme set and the dim set is occupying the cpu nodes and the gpu nodes.

B

So what we've done is we introduced a a software which is called node feature discovery uh we've written an operator for it, so it's available in openshift and what does node feature discovery does is exposes node features as labels. So we have labels for cpu flags for the kernel for operating system and, as you see on the right side, pci 10de 10d is the pci vendor id for nvidia. So we have now a label where we can steer either with a node, selector or port affinities and anti-affinities or to the right node.

B

Optimized workloads can now be placed on the right node. As I said, gpu stack on the gpu node avx 512 optimized workloads on the avx 500 round nodes. Why is this important uh workloads that are optimized for abx 512 will not run on any other. You will just get a legal instruction, so you want to make sure that your port lands on the correct node.

B

I've added here some links for node feature discovery and how we are using also nfd for multi-architecture image, builds uh optimized low-level libraries for openshift and how to steer those libraries and workloads uh to nfd.

B

We have upstream nfd and the operator and a downstream version, especially for openshift, to just test it and run through qe to make sure that every nfd release is working flawlessly on openshift and we have nfd since 4.2 as the ga feature in openshift.

B

So now that we have bootstrapped our cluster, we know where our gpu noticed. uh We need to think about how to enable this.

B

The software, the nvidia, complete stack on openshift and especially keeping in mind that the default operating system is rated chorus.

B

So we started thinking about how we can support uh immutable hosts since openshift 3.8 and we've done some experiments on atomic and what we've come up is a thing called. A driver container. A driver container is not only a delivery system for kernel modules, but it's also a a image that handles the drivers and starts demons that are needed, create some sys controls and all the things a privileged container would need to do to enable the hardware.

B

So the first thing, what we're going to do is deploy a driver container. The driver container contains the modules user space and the hook after the driver container is deployed.

B

We are running a small validation step, uh the driver container validation, it's a small workload that uses the accelerator just to verify that the drivers are loaded and we can access the hardware. uh In this case we are running cuda vector at the nice thing about cuda vector it allocates memory. So we know that memory is working, does some computation, so the compute costs are working and we can be sure after we are running cuda vector ad, that the drivers are installed correctly.

B

Next step is device plugin the device. Plugin is the piece that exposes the accelerator to the cluster as a extended resource, so uh pods can allocate an extended resource and can be scheduled, but that's what we are doing as a next step. The device, plugin validation, is simply the same workload before by just allocating a extended resource. So we want to make sure that the extended resource allocation works and again that we can run a gpu workload on the cluster.

B

We have enabled the hardware we exposed it to the cluster. The next step is monitoring, so we are setting up prometheus and grafana, adding metrics a custom, note exporter that exposes gpu metrics and we are also adding alerts, uh so it is visible in the cluster. If something happens, if it's overheating malfunctioning and stuff like that, when monitoring is done as an optional step, uh we can deploy a siteguard container for nfd.

B

Nfd has a hook mechanism where you can exploit nfd to to extend the labeling system with your custom sources, and in this way we are using the sidecar container for nfd uh to expose more sophisticated features of a gpu like the gpu type, the gpu name, the memory, the cuda cores, uh firmware version, driver version and stuff like that. So in the case of aiml, you want to use, for example, a v100 for training and want to use a t4 for inferencing.

B

So with the feature discovery and those those exposed new labels with a prefix like nvidia.com, you have more fine-grained scheduling um to see your workloads to a specific gpu nodes.

B

And summing, all of those steps together is the special resource operator we have developed. This pattern repeats for most of the accelerators we've been enabling on openshift.

B

Some need more features, some need less features, but as sro is so extensible uh that you can add as much states as you want or just leave it just by the driver container asteroids. Also sorry, it's also capable on handling several driver containers for several uh vendors, and it really depends on on the accelerator stack, um what it's really needed and what needs to be deployed. uh We will now see also how sro can handle updates in a openshift cluster.

B

Sro has a mechanism for templatizing manifests so that you can inject runtime information like the kernel version, operating system version.

B

And other stuff directly into the manifest you are exposing and all of this I see it sro as a small state machine, because if you are deploying the driver container, it makes no sense to deploy the device plug-in uh in parallel, uh because it will just not work. uh We have some parallelism. We are steering everything with labels, so uh in sro we have a a sequential uh approach and then, where it's capable, we are also running stuff in parallel.

B

So, for example, when we are deploying the device plug-in, we are running the monitoring feature discovering and other parts in parallel to enable it in sro and sro has been used with several vendors. Now uh prime example is now nvidia, which we have a solar flare, also used sro to enable their hardware, and I will get back to another use case where we use sro as a visibility study or as a poc vehicle to enable hardware on on openshift.

B

Besides that, sro also supports some more configuration. We can hit hard or soft petitioning. I will come back to that later. We can choose the driver versions, we can use, and some other configurations for driver container building and all of those states are driven by custom, manifests that are in a conflict map. So during the run time of sro, you can live edit.

B

Your config map and sroger reconcile all your custom manifest that you're working on and if you wanted to, for example, test new features you could edit the config mac for the driver container enables some settings, sro will reconcile and the complete stack would be updated.

B

Let me get into detail how sero is dealing with updates. So, let's start with the normal use case, uh we have deployed nfd and sro, so the first step is nft detects the kernel version and labels, the node. We have a worker node kernel running for 18080, uh the nft worker diem set detects. It sends a message to the nfd master and the nfd master through the kubernetes api labels, the node and says: okay, I'm running with kernel, 418 080..

B

The next step is sro reads: those configurations uh uses build configs to build a driver container for 418 0 8080 and push it to the local image registry.

B

After it's all pushed into amateurs, image registry scroll will deploy the driver container and the gpu driver demon set will pick up this internal from the internal build and enable the driver container and the driver and the accelerator and the other steps are followed.

B

So what happens if, if an update comes in nfd will, of course detect it see it's that we have a new kernel version, we'll relabel the node and, as a row will detect this mismatch and we'll use this new information. It gets this front information injected into the build config and say: okay, build me now. The driver container uh for this specific kernel version push it again to the image registry.

B

S0 will update the gpu driver, diem set, add the new version and this new version will be pulled by the driver. Beam set restarted and the complete stack will be also restarted uh so that all those new changes that are coming with the driver are applied to all the uh to the complete stack that is already deployed.

B

Special resource operator is it's a way how to enable special resources, the openshift way, I've included uh several links here. uh The first link is the github repository.

B

uh The next link is how to use entitled image builds to build driver containers with uvi on openshift, which is the which we are using to build. The driver containers to always stay in sync, with the kernel versions and the user, libraries and all the tested and bug fixed and cve tested, ubi images, uh I've written two blog posts, part one and part two for those curious on the inner workings of s0 part.

B

2 is really uh how api is working, how open shift kubernetes works internally and what sro building blocks we are providing to enable hardware.

B

The last link is how to deploy the nvidia gpu operator on openshift and using the cluster autoscaler to scale the rapids image, but also curious. What the relation between the nvgp operator and sro is. The nvidia gpu operator is a a fork. A one-to-one copy of of sro after and nvidia has picked up s0 they of course, relabeled. It done some extension to it to their needs and, but mostly it works like I just described.

B

So what else can we do with gpus? uh We have the initial gpu cluster configuration with one gpu node. uh What works today is we can use the cluster autoscaler for demand gpu nodes. So if you have a for example, if you have a port pending with the extended resource nvidia.com, uh you can create a cluster auto scalable for specific machine sets to auto scale gpu nodes. We are currently working also on enabling hpa to work uh with with gpus.

B

uh Before I I mentioned heart and soft partitioning, what we mean in hearts of partitioning is hard partitioning is attainable toleration, so esseru is able to automatically auto taint the gpu nodes uh with the extended resource.

B

What we want to prevent pertains is that we only schedule gpu only parts on the gpu nodes without having any cpu nodes, interacting with our gpu nodes. uh The other way is to do soft partitioning so to allow cpu and gpu nodes on the gpu nodes, but have priorities. So any gpu workload or part that comes in with a high priority or low priority will have is first to run on the gpu rather than cpu node.

B

We can also, of course, uh combine those patterns of partitioning uh combining both that we are only running gpu nodes, but now we have also high priority and low priority uh running on those gpu nodes.

B

Another thing that's working with gpus is coders, so you can have coders per namespace, multiple namespaces for cpu, mem and, of course, for extended resources uh like the gpu. With all these features, uh you can go on and maybe create even some more roles.

B

You can have clustering nodes with specific roles uh for gpu training or for gpu inferencing, depending on your gpu type or what or what you're exactly want to do in your gpu cluster uh by default, the infra-only parts and master nodes are already tainted, so you could also have things on your cpu nodes and priorities.

B

If you have a mixed cluster, where you're running cpu, workloads and and gpu workloads and all those those spots can be repelled or attracted or not scheduled with infinities anti-affinities or node selectors.

B

The other thing I want to mention, if you have several gpu nodes, you want to have them interconnected with a high high speed interconnect. So this is where maltese comes into place and for those people who don't know what maltese is maltese provides secondary interfaces to the parts, so you can have a separate data plane for your gpu node. So that is important, worry that you are interacting with the api.

B

So with the gpu nodes and nv stack, you already have something like a separate, compute, plane, uh independent and out of reach of the api you're, just running your workload on the gpus and now not to use um the internal network, but the dedicated network. You can use multis and what we are currently working on. Is uh we've used sro to show and make a poc how to enable rdma, completely containerized in in openshift with uh running either over infiniband or ethernet.

B

So all the pieces, like the moffett, stack, the gpu direct driver and all other pieces that we need for gpu interconnects are containerized and running on reddit cores, uh the company or the vendor. We are currently working on used again sro as the driving vehicle to test this poc, and now they are implementing on top of sro, and with this with this template, their own operator to enable rdma or infinite or ethernet or or gpus.

B

I've included here some demos. I've done in the past. First link is the flowers demo on openshift, and we have also some cli footage and running rapids with gpus on openshift.

B

I've also included a link to a google doc how to enable gpus with okd 4.5, and the future work we are doing is one is firstly, gpu direct to enable fast interconnects and the other future work is mix. Support mix support is a new feature of the new gpus to slice up the gpu in several distinct pieces, so a big gpu can be sliced up to seven pieces and those seven seven slices are recognized by openshift as seven dedicated gpus.

B

So this is some of the new features that's coming up and we are working closely with to enable this in openshift as soon as possible.

B

And this is also my last slide. Do we have any questions.

A

Indeed, we do and uncle that was a really great presentation, um so someone else is asking is available on okd. um Is it also available on top of f cost, I'm assuming that's uh like control operating system, but I'm I could be wrong.

B

That's fedora chorus. uh I just I just switched to my terminal oc get cluster version.

B

We have here a okd 4.5 running. uh We don't use s 0 here we can use the nvidia gpu operator here. I've included the instruction how to use it oc get part. We have all the usual suspects on okd running and if I do aoc get notes, this is my last oc describe note.

B

Rep 10 de. I just want to show that nfd works out of out of the box. We are currently working on edit to the community operators on operator hub so that it shows up in operator for opd as well. So we have a gpu in here and if we grab for the extended resource, we can see that we have uh one gpu in this cluster zeros are allocated, but we have one available and we are running.

A

B

Okd, uh let's see debug node.

B

B

Just to make sure that I'm not faking anything.

B

To release, we are running on a with our core os 32 uh working with nvidia to add fedora support into their driver. Currently we have to build our own driver container and push it to a registry, uh but working with nvidia to add this in the official repositories.

A

Great, so we also have a question from james um he's asking is that amd mx gpu.

B

We don't have, there is no amd here. Okay, what is the question? Is the question if you are engaging with amd to support it in openshift or is the question if you are using right now, amd gpu.

A

um I guess we could answer both. So are we engaging with amg right now and if no or is there, is there a plan to.

B

B

I'm not using any amd gpu uh but amd has all the pieces I described here already posted they have a pre-start hook. They have the device plug-in, they do not have any metrics, they don't have any monitoring. So technically it should work. um There were some discussions on enabling amd here.

A

Okay, so md does work. You can confirm that right.

B

I haven't tested it myself, I'm just saying all the pieces. Are there the preset hook the device plug-in uh the drivers? I don't know if it works on rated chorus.

A

B

Haven't built a driver container so from the technical point of view with some hacks it should work.

A

Okay, thank you. So um we have a question here someone's asking. So if I understand correctly, we do not. We do not change the red hat chorus or fedora kois. The operator will install a driver container and push it to open shift, um stroke, okd default registry, and that enables the gpus on okd. Is that correct and.

B

I just have that question yeah. I understand the question so, okay, uh if what we wanted from the beginning, even when started with atomic on openshift 3.8, we didn't want to touch the host at all, so we are not doing anything on the host operating system. Why we want to make updates as much if possible. So we don't want to write somewhere to some suspicious directories like slash, opt and then clean up afterwards.

B

If we reboot the node, we don't want to have any traces of anything on the node, so everything is in the container. S0 pushes us to the internal registry. Nvidia has used a different path. They are building their drivers on demand in the driver container. So if you are deploying the nvidia gpu operator, it will build the driver on demand right away, as I pull it on the cluster.

B

Why sro has done it with the internal registry, because we wanted also to support uh customers that are in disconnected environments. So what sro is doing? It first deploys the driver container and if it sees that we are in error, image pool or image pullback off it will deploy the build config.

B

This way, a customer could push a custom built driver container to their internal registry up front and sro will just pick up this driver container without being connected to the internet, because, right now we have to fetch uh the driver from from the internet to build it.

A

All right, um thanks for that funko, uh we also have another question from youtube: daniel's asking are canal headers included in redhead core s.

B

They are not included. What you have to do is to entitle your cluster.

B

I've posted one of the links on nfd cloud on the nfd no teacher discovery page is the link how to entitle your cluster, how to build red hat software, so you need a subscription to get kernel, devil, kernel, headers and kernel core, that's needed for a kernel driver in the red hat chorus case. You need an entitled cluster to get access to this to the software on federal core s. We don't need it, there's no entitlement. We can use it right away.

A

Great great, thank you. um So we also have another question. So can you elaborate on how to achieve the gpu only high priority low priority jobs, and is that something or is that how we can be more granular on non-gpu workload.

B

B

Okay, let me just think what would be the ask of this question.

B

One aspect is the granularity, fine grain scheduling on the type of gpu and the features of gpu. This would be like I'm targeting the workload for v100s or d4s, and the other aspect is uh the priorities. So every part you can provide a priority that you custom, define and if you are using taints, only gpu workloads would run on a specific gpu node with the stains. So we are excluding like the cpu uh workloads if you are using taints.

B

This would be like the hard partitioning part where we are repelling all cpu workloads and then inside a gpu, node or namespace, or where your. How do you cluster your gp nodes? You would have priorities and you can add as much as as you want to have priorities. I just included like high and low, but you can have a more fine granular priority classes from one to ten. One to five depends on really what you want to achieve, and higher priority parts have a higher priority, of course, for running and will evade low priority parts.

B

You can be sure that your high priorities uh are running uh with a high at first on a ngpu node.

A

Thank you, um so here's another one from sandrino, um so he says a month ago he tried the nvidia operator on a test cluster and then video creator was trying to fetch kernel headers in the extended update support um and our entitlement does not or the entitlement does not cover esu. So there is a rule that you showcase today solve the esu issue.

B

uh We we know, we know this issue, we are working on it.

B

I cannot go too much into details, but on the nvidia driver container is instructions how to enable extended update support in the container for building it, but we are working on it to have this issue resolved in in openshift, so that customers can easily fetch kernel, devil, kernel, headers and and kernel core for building their driver containers.

B

um Another another enhancement uh we are working on is to provide a simpler way how to build driver containers. So this is currently in the working to hide all those nasty details uh from customers like where to fetch the kernel headers to enable as extended update support repositories, uh fetch it from somewhere else. So uh we are building something like a base thing for customers uh so that they can easily build driver containers without thinking of uh which kernel version I'm running. Where do I get my dependencies?

B

uh What do I need to enable and stuff like that?.

A

Thank you, funko, um I'm just checking if we have any other questions, but uh w says thank you for clarifying a question, and um oh, we have another question here. So the feature discovery is sideka will enable vgpu, um that is scheduling, cause not a whole gpu. Is it.

B

B

This side contains the features.

A

I guess this is a two-part question here.

B

Yeah, it's it's a two-part question. Can can you repeat the question.

A

Yeah, absolutely so will the feature discovery sidecar enable the gpus.

B

A

B

It has nothing to do with vgpus.

A

B

A

If you're running.

B

For example, just annotation, if you're running, for example, openstack and install the vgp or compute server from nvidia, it will work to pass through with gpus in vms and then, if you're, installing openshift on openstack, you just need to install the nvidia gpu driver, uh the nvidia gpu operator, and it will it will work. The vgp is on a different level. uh It's not on the open shift level. It's on the infrastructure level.

A

Okay, all right, thank you. um So we also have no question here. So is the scheduler aware of the gpu load panel.

B

The scheduler is aware of the gpu in the sense that a port is requesting a extended resource and what the scheduler will do is to check if all requests or allocations a part is doing, can be satisfied uh by a node and and and and that's all.

B

You have to remember that currently, a gpu is allocated as a whole unit. There is no gpu sharing, so if you are saying your ipod is allocating nvidia.com clone one, it will allocate the complete gpu and if you have only one gpu on the node one part with one gpu location will run on on this node.

B

There's no sense in the case, for example, a pod is using half of the memory, I could schedule another part. No, the gpu is seen as a whole unit and you can only allocate a complete unit in the cluster.

A

So is that is that any something like a gpu pressure indicator in that case.

B

um You could do that, for example, sro has sample implementations of the metrics and alerts. uh We are, for example, alerting the customer if the gpu is idle and he's wasting money uh or if the temperature is too high. You have a problem with cooling. uh That's the same thing you could do for for pressure. uh You could take out a specific metric, which that tells you the how much memory is used or how many cores are used and expose this as a alert.

B

If a given threshold is overwritten, you can alert the cluster or the administrator or the user, who is sitting in front of the console that, like 99 percent of the gpu, is, is used. um This will otherwise not be fed back into scheduling decisions.

B

As I said, the gpu is as a whole unit and what you do on on the note with the gpu is one part, and this will not make any decisions on on the scheduling. If you can allocate the gpu, you can do whatever you want with the gpu.

A

Okay, um so is it possible? Is it possible for the ports to share.

B

uh Technically, yes,.

A

B

Not not in in in openshift right now.

A

Okay: okay, um yes, we'll jump back to um the gpus again so vsphere. If one enable vgpu licensing, then open shift via the nvidia gpu operator.

A

I'm sorry! Let me I'm trying to understand this question as well. um So if one enables vgpu licensing, then openshift by the nvidia gpu operator will be able to see this.

A

Does that question.

B

Yeah yeah uh everything that is visible by the operating system and exposed as a pci device, which is done in vsphere or in openstack, which is essentially like something like a pass-through, pci pass-through and the operating system. That is deployed on top of this infrastructure.

B

And if the gpu is detected like a normal device, which, I suppose it is, uh the nvidia gpu operator will work.

A

Okay, okay, that's clear! um Second, let me see if we have any more questions coming in.

A

Okay, I think we're clear, thank you so much for taking the questions um my pleasure yeah, that was, that was a very um useful session and especially right now, with gpus um being such a hot topic, um I'm just gonna give it a few more seconds to see if we're gonna get any more questions.

A

Okay, when they're clear, um so thank you so much- and this is a very great and timely session- um always insightful- um always educational, huge, shout outs to chris shot as well for promoting this session and making the live stream very seamless, and I guess with that we'll let you all get back to your days. Thank you. So much for joining.

B

A

A