youtube image
From YouTube: Machine Learning Using Various GPU Technology With Kubeflow. - Jihye Choi, SAMSUNG SDS

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Machine Learning Using Various GPU Technology With Kubeflow. - Jihye Choi, SAMSUNG SDS

Speakers: Jihye Choi
Everyone who works in MLOps tends to have a perception that limited cost and GPU is crucial. Kubeflow is a great open source, but it provides very little elements to handle efficient distributed learning through coupling tightly with GPU or by maximizing GPU utilization. 1. A simplified model uses a relatively small amount of GPU, as using the entire GPU capacity is considered as waste of resources. The Multi-Instance GPU applied to the NVIDIA A100 provides a technology that splits one GPU into up to 7 instances, and this presentation shows how to combine this top-notch technology with Kubeflow. 2. As the size of the model increases, distributed training becomes more necessary when using multiple GPU servers for efficiency. GPUDirect RDMA is a high-performance networking technology that directly communicates and processes GPU memory without CPU and system memory intervention. As a result, you can get tried and true experience, which improves GPU utilization and performance in Kubeflow.