youtube image
From YouTube: How We Built an ML inference Platform with Knative - Dan Sun, Bloomberg LP & Animesh Singh, IBM

Description

How We Built an ML inference Platform with Knative - Dan Sun, Bloomberg LP & Animesh Singh, IBM

Deploying and scaling machine learning(ML)- driven applications in production is rarely a simple task. However, serverless inference has been simplified and accelerated through the use of Knative. Knative runs serverless containers on Kubernetes with ease and handles all the details related to networking, requests volume-based autoscaling (including scale-to-zero), and revision tracking. It also enables event-driven applications by integration seamlessly with various event sources. In this session, the speakers will discuss why their organizations initially chose Knative when building their ML inference platforms, and how these efforts evolved into KServe (github.com/kserve) project. We will also discuss how we leverage Knative to implement blue/green/canary rollout strategies for safe production updates to our ML models, improve GPU utilization with scale-to-zero functionality, and build Apache kafka events-based inference pipeline. At the end of the talk, we will share some of the testing benchmarks (compared with Kubernetes HPA), as well as performance optimization tips that have enabled us to run hundreds to thousands of Knative services in a single cluster.