youtube image
From YouTube: Enhancing the Performance Testing Process for gRPC Model Inferencing at S... Ted Chang, Paul Van Eck

Description

Enhancing the Performance Testing Process for gRPC Model Inferencing at Scale - Ted Chang, Paul Van Eck, IBM

Performance testing is a critical part of software development that helps us to identify bottlenecks early on and avoid costly crashes that impact operation. When it comes to thousands of machine learning models of many different formats and sizes, ensuring that users can perform inference on these models in reasonable time is paramount. In this session, we show how a Kubernetes cluster is set up with KServe's ModelMesh to enable the high-density deployment of models for gRPC inference. Then, we demonstrate how we load test several thousands of models, and how Prometheus and Grafana are used to illustrate and monitor key performance metrics.