youtube image
From YouTube: Improving GPU Utilization using Kubernetes - Maulin Patel & Pradeep Venkatachalam, Google

Description

Don’t miss out! Join us at our upcoming hybrid event: KubeCon + CloudNativeCon North America 2022 from October 24-28 in Detroit (and online!). Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Improving GPU Utilization using Kubernetes - Maulin Patel & Pradeep Venkatachalam, Google

Kubernetes supports efficient utilization of resources by enabling applications to request the precise amounts of resources it needs. Unlike fractional requests for CPUs, fractional requests for GPUs are not allowed in Kubernetes. GPU resources requested in the pod manifest must be an integer number. This means one GPU is fully allocated to one container even if the container only needs a fraction of GPU for its workload. Without the support for fractional GPUs, GPU resources are invariably over provisioned leading to a wastage. This is especially true for inference workloads that process a handful of data samples in real-time. To address this limitation, we have developed user-friendly solutions that allow a single GPU to be shared by multiple containers thereby improving utilization of GPUs and saving cost. In this talk, we will show the demos of our solutions and share performance results.