19 May 2022
A Component Registry for Kubeflow Pipelines - Christian Kadner, IBM
Kubeflow Pipelines are widely used to orchestrate machine learning (ML) workflows on Kubernetes. Pipelines and individual pipeline stages are often worked on collaboratively. To facilitate that process Kubeflow Pipelines support re-usable components, self-contained sets of code that performs one step in the ML workflow, like data preprocessing, data transformation, model training, and model serving. There is a rich set of components from community and vendors. What has been missing from the ecosystem however, is a registry for sharing reusable components with the public or among teams of data scientists. Thus many of the common tasks required to run ML workflows on Kubernetes like creating secrets, persistent volume claims, config maps have to be implemented again and again. A component registry can provide a rich catalog of components to solve those common tasks and ease the burden of creating ML workflows on Kubernetes.
Kubeflow Pipelines are widely used to orchestrate machine learning (ML) workflows on Kubernetes. Pipelines and individual pipeline stages are often worked on collaboratively. To facilitate that process Kubeflow Pipelines support re-usable components, self-contained sets of code that performs one step in the ML workflow, like data preprocessing, data transformation, model training, and model serving. There is a rich set of components from community and vendors. What has been missing from the ecosystem however, is a registry for sharing reusable components with the public or among teams of data scientists. Thus many of the common tasks required to run ML workflows on Kubernetes like creating secrets, persistent volume claims, config maps have to be implemented again and again. A component registry can provide a rich catalog of components to solve those common tasks and ease the burden of creating ML workflows on Kubernetes.
- 1 participant
- 29 minutes
19 May 2022
A Deep Dive into Kubeflow Pipelines - Senthil Raja Chermapandian, Ericsson
A Machine Learning model is only a tiny piece in a series of multiple processing steps executed as part of an ML workflow. A pipeline is a description of an ML workflow, including all the components in the workflow and how they combine in the form of a graph. Kubeflow Pipelines (KFP) is an open-source project that helps to run Cloud-native ML pipelines on Kubernetes. While most previous talks on KFP have focused on Data Scientists and Data Engineers, this talk will dive deep into KFP, covering its architecture, platform components and how the platform components work together in executing the workflow.
A Machine Learning model is only a tiny piece in a series of multiple processing steps executed as part of an ML workflow. A pipeline is a description of an ML workflow, including all the components in the workflow and how they combine in the form of a graph. Kubeflow Pipelines (KFP) is an open-source project that helps to run Cloud-native ML pipelines on Kubernetes. While most previous talks on KFP have focused on Data Scientists and Data Engineers, this talk will dive deep into KFP, covering its architecture, platform components and how the platform components work together in executing the workflow.
- 1 participant
- 32 minutes
19 May 2022
Building AutoML Pipelines With Argo Workflows and Katib - Andrey Velichkevich, Apple + Johnu George, Nutanix
The fairly recent field of Automated Machine Learning (AutoML) provides the richness of powerful algorithms for model selection and hyperparameter (HP) tuning – one of the most important steps of the MLOps lifecycle. Katib is a popular Kubernetes native open source project to perform AutoML. Katib can tune HPs for models written in any framework such as Tensorflow, PyTorch, MXNet, and Scikit learn. To find the best HPs, metrics are evaluated after a model training step. Usually, the model training is a complex process which includes data preprocessing, data validation, actual training, and many more. This whole lifecycle can be represented by a workflow dependency graph by specifying dependencies between model operations. Argo Workflows provides a great container-native workflow engine to orchestrate jobs on Kubernetes, which makes it an ideal candidate for Katib Experiments. This talk will demonstrate how Argo Workflows natively integrates in Katib infrastructure.
The fairly recent field of Automated Machine Learning (AutoML) provides the richness of powerful algorithms for model selection and hyperparameter (HP) tuning – one of the most important steps of the MLOps lifecycle. Katib is a popular Kubernetes native open source project to perform AutoML. Katib can tune HPs for models written in any framework such as Tensorflow, PyTorch, MXNet, and Scikit learn. To find the best HPs, metrics are evaluated after a model training step. Usually, the model training is a complex process which includes data preprocessing, data validation, actual training, and many more. This whole lifecycle can be represented by a workflow dependency graph by specifying dependencies between model operations. Argo Workflows provides a great container-native workflow engine to orchestrate jobs on Kubernetes, which makes it an ideal candidate for Katib Experiments. This talk will demonstrate how Argo Workflows natively integrates in Katib infrastructure.
- 2 participants
- 22 minutes
19 May 2022
Closing Remarks - Alex Collins, Intuit + Jessica Andersson, Annotell
- 2 participants
- 5 minutes
19 May 2022
Computer Vision Dog Breed Classification with Convolutional Neural Networks, TensorFlow and Kubeflow - Konstantinos Andriopoulos, Dorothea Kalliora, Arrikto
Sick of strangers at the dog park constantly commenting on how good looking your pup is, but at a loss when they ask, “What breed is it?” Me too! Why not use AI to answer the question for you? For data scientists looking for an open source and scalable way to tackle these sorts of problems, Kubernetes and Kubeflow make analyzing content in images and video much easier than trying to build everything from scratch and run it on bare metal or VMs. In this talk we’ll work through the development of a Notebook that leverages the combined powers of TensorFlow, ResNet-50 models, convolutional neural networks, VGG16, and transfer learning to see how accurately these algorithms can predict the breed of my dog. Spoiler alert! I have the genealogy results, so there will be a big reveal with DNA pitted against a variety of algorithms.
Sick of strangers at the dog park constantly commenting on how good looking your pup is, but at a loss when they ask, “What breed is it?” Me too! Why not use AI to answer the question for you? For data scientists looking for an open source and scalable way to tackle these sorts of problems, Kubernetes and Kubeflow make analyzing content in images and video much easier than trying to build everything from scratch and run it on bare metal or VMs. In this talk we’ll work through the development of a Notebook that leverages the combined powers of TensorFlow, ResNet-50 models, convolutional neural networks, VGG16, and transfer learning to see how accurately these algorithms can predict the breed of my dog. Spoiler alert! I have the genealogy results, so there will be a big reveal with DNA pitted against a variety of algorithms.
- 5 participants
- 29 minutes
19 May 2022
Debugging Machine Learning on the Edge with MLExray - Michelle Nquyen, Stanford
- 4 participants
- 27 minutes
19 May 2022
Efficient AutoML with Ludwig, Ray, and Nodeless Kubernetes - Anne Marie Holler, Elotl + Travis Addair, Predibase
The open-source platforms Ludwig and Ray make Deep Learning (DL) accessible to diverse users, by reducing complexity barriers to training, scaling, deploying, and serving DL models. Recently, Ludwig was extended to support AutoML, for tabular datasets (v0.4.1) and for text classification datasets (v0.5.0), using Ray Tune for hyperparameter search. In this talk, we discuss how Ludwig AutoML exploits heuristics developed using a set of training datasets to efficiently produce models for validation datasets. And we show how running Ludwig AutoML on cloud Kubernetes clusters, using Nodeless K8s to add right-sized GPU resources when they are needed and to remove them when not, reduces cost and operational overhead vs running directly on EC2.
The open-source platforms Ludwig and Ray make Deep Learning (DL) accessible to diverse users, by reducing complexity barriers to training, scaling, deploying, and serving DL models. Recently, Ludwig was extended to support AutoML, for tabular datasets (v0.4.1) and for text classification datasets (v0.5.0), using Ray Tune for hyperparameter search. In this talk, we discuss how Ludwig AutoML exploits heuristics developed using a set of training datasets to efficiently produce models for validation datasets. And we show how running Ludwig AutoML on cloud Kubernetes clusters, using Nodeless K8s to add right-sized GPU resources when they are needed and to remove them when not, reduces cost and operational overhead vs running directly on EC2.
- 3 participants
- 28 minutes
19 May 2022
Enhancing the Performance Testing Process for gRPC Model Inferencing at Scale - Ted Chang, Paul Van Eck, IBM
Performance testing is a critical part of software development that helps us to identify bottlenecks early on and avoid costly crashes that impact operation. When it comes to thousands of machine learning models of many different formats and sizes, ensuring that users can perform inference on these models in reasonable time is paramount. In this session, we show how a Kubernetes cluster is set up with KServe's ModelMesh to enable the high-density deployment of models for gRPC inference. Then, we demonstrate how we load test several thousands of models, and how Prometheus and Grafana are used to illustrate and monitor key performance metrics.
Performance testing is a critical part of software development that helps us to identify bottlenecks early on and avoid costly crashes that impact operation. When it comes to thousands of machine learning models of many different formats and sizes, ensuring that users can perform inference on these models in reasonable time is paramount. In this session, we show how a Kubernetes cluster is set up with KServe's ModelMesh to enable the high-density deployment of models for gRPC inference. Then, we demonstrate how we load test several thousands of models, and how Prometheus and Grafana are used to illustrate and monitor key performance metrics.
- 4 participants
- 32 minutes
19 May 2022
Exploring ML Model Serving with KServe (with fun drawings) - Alexa Nicole Griffith, Bloomberg
KServe (formerly known as KFServing) provides an easy-to-use platform for deploying machine learning (ML) models. KServe is built on top of Kubernetes and provides performant, high abstraction interfaces that allow data scientists to spend more time focusing on building new models, and less time worrying about the underlying infrastructure. This open source project provides a simple, pluggable solution for common infrastructure issues with inference models, like GPU scaling and ModelMesh serving for high volume/density use cases. From the perspective of an eager engineer new to the KServe community, we will explore the KServe features that solve common issues for engineers and data scientists who are interested in or responsible for machine learning model deployment. Expect to learn about KServe’s fundamental offerings, like out-of-the box model serving and monitoring, and its exciting new, advanced functionalities, such as its inference graph capabilities and ModelMesh features. We will discuss the host of new features added to the project since its publication in 2019 and also outline KServe’s roadmap as it moves forward towards its v1.0 release.
KServe (formerly known as KFServing) provides an easy-to-use platform for deploying machine learning (ML) models. KServe is built on top of Kubernetes and provides performant, high abstraction interfaces that allow data scientists to spend more time focusing on building new models, and less time worrying about the underlying infrastructure. This open source project provides a simple, pluggable solution for common infrastructure issues with inference models, like GPU scaling and ModelMesh serving for high volume/density use cases. From the perspective of an eager engineer new to the KServe community, we will explore the KServe features that solve common issues for engineers and data scientists who are interested in or responsible for machine learning model deployment. Expect to learn about KServe’s fundamental offerings, like out-of-the box model serving and monitoring, and its exciting new, advanced functionalities, such as its inference graph capabilities and ModelMesh features. We will discuss the host of new features added to the project since its publication in 2019 and also outline KServe’s roadmap as it moves forward towards its v1.0 release.
- 5 participants
- 28 minutes
19 May 2022
Kubernetes + AI Joining Forces in the Battle Against Cancer - Wojciech Małota-Wójcik, Ridge
A doctor is able to treat one patient at a time. On the other hand, engineers may create software analyzing thousands of cases every day! This presentation will focus on how geographically distributed Kubernetes clusters and AI are indispensable tools for IT professionals and computational biologists as they join forces to battle cancer. Computational biology — combining AI, medicine, mathematics, statistics, IoT and cloud computing — is increasing the precision and capacity of diagnostic processes. This evolution requires petabytes of storage, low-latency networking, efficient GPUs and ease of deployment on a massive scale. This leads us directly to the growing need for highly geographically distributed Kubernetes clusters, running as close to the hospital as possible. The presentation will review the basics of processing cancer images and then show actual examples of how AI algorithms are developed and deployed on K8s clusters and used by doctors to perform life-saving treatments.
A doctor is able to treat one patient at a time. On the other hand, engineers may create software analyzing thousands of cases every day! This presentation will focus on how geographically distributed Kubernetes clusters and AI are indispensable tools for IT professionals and computational biologists as they join forces to battle cancer. Computational biology — combining AI, medicine, mathematics, statistics, IoT and cloud computing — is increasing the precision and capacity of diagnostic processes. This evolution requires petabytes of storage, low-latency networking, efficient GPUs and ease of deployment on a massive scale. This leads us directly to the growing need for highly geographically distributed Kubernetes clusters, running as close to the hospital as possible. The presentation will review the basics of processing cancer images and then show actual examples of how AI algorithms are developed and deployed on K8s clusters and used by doctors to perform life-saving treatments.
- 1 participant
- 26 minutes
19 May 2022
Managing Multi-Cloud Apache Spark on Kubernetes - Ilan Filonenko, Aki Sukegawa, Bloomberg
Bloomberg has built multi-cloud quant platforms on top of Kubernetes to enable its users to develop sophisticated financial applications with integrated first-class data science capabilities. In this journey, it quickly became clear that managing data science infrastructure in a multi-cloud environment is challenging, especially when it comes to Apache Spark. While Kubernetes provides an excellent abstraction for designing composable infrastructure substrates, it comes with a list of challenges when dealing with auto-scaling, scheduling, preemption, and security. Given these challenges, this talk will explore how one can effectively manage an expansive Spark infrastructure solution that spans bare-metal and multiple public cloud platforms. We will also walk through various observability strategies, primarily focusing on how to surface cluster information to a varied group of Spark end-users by leveraging a variety of native Kubernetes resources, like node autoscalers, controllers, and custom PodConditions.
Bloomberg has built multi-cloud quant platforms on top of Kubernetes to enable its users to develop sophisticated financial applications with integrated first-class data science capabilities. In this journey, it quickly became clear that managing data science infrastructure in a multi-cloud environment is challenging, especially when it comes to Apache Spark. While Kubernetes provides an excellent abstraction for designing composable infrastructure substrates, it comes with a list of challenges when dealing with auto-scaling, scheduling, preemption, and security. Given these challenges, this talk will explore how one can effectively manage an expansive Spark infrastructure solution that spans bare-metal and multiple public cloud platforms. We will also walk through various observability strategies, primarily focusing on how to surface cluster information to a varied group of Spark end-users by leveraging a variety of native Kubernetes resources, like node autoscalers, controllers, and custom PodConditions.
- 5 participants
- 31 minutes
19 May 2022
Sponsored Keynote: Challenges and Opportunities in Making AI Easy and Efficient with Kuberenetes - Maulin Patel, Google
- 1 participant
- 10 minutes