youtube image
From YouTube: Scaling AI Inference Workloads with GPUs and Kubernetes - Renaud Gaubert & Ryan Olson, NVIDIA

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Scaling AI Inference Workloads with GPUs and Kubernetes - Renaud Gaubert & Ryan Olson, NVIDIA

Deep Learning (DL) is a computational intense form of machine learning that has revolutionize many fields including computer vision, automated speech recognition, natural language processing and artificial intelligence (AI). DL impacts every vertical market from automotive to healthcare to cloud, as a result, the training and deployment of Deep Neural Networks (DNNs) has shifted datacenter workloads from traditional CPUs to AI-specific accelerators like NVIDIA GPUs. Leveraging several popular CNCF projects such as Prometheus, Envoy, and gRPC, we will demonstrate an implementation of NVIDIA’s reference scale-out inference architecture, capable of delivering petaops per second of performance. This is a new and challenging problem in the datacenter and we will discuss these challenges and ways to optimize for service delivery metrics (latency/throughput), cost, and redundancy.

To Learn More: https://sched.co/GrVq