youtube image
From YouTube: How Cookpad Leverages Triton Inference Server To Boost Their Model S... Jose Navarro & Prayana Galih

Description

Don’t miss out! Join us at our upcoming hybrid event: KubeCon + CloudNativeCon North America 2022 from October 24-28 in Detroit (and online!). Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

How Cookpad Leverages Triton Inference Server To Boost Their Model Serving - Jose Navarro & Prayana Galih, Cookpad

The adoption of MLOps practices and tooling by organizations has considerably reduced the pain points to productionise Machine Learning models. However, with the increase of the number of models available by a company to deploy, the diversity of frameworks used to train those models and the different infrastructure required to run each model, new challenges arise for Machine Learning Platform teams e.g: How can we deploy new models from the same or different frameworks concurrently? How can we improve throughput and optimize resource utilization in our serving infrastructure, especially GPUs? Cookpad ML Platform Engineers will talk in this session how Triton Inference Server, an open-source model serving tool from Nvidia, can simplify the process of model deployment and optimise the resource utilisation by efficiently supporting concurrent models on single GPU or CPU, and multi-GPU servers.