Cloud Native Computing Foundation Kubernetes AI Day EU 2022 Open Meetings

19 May 2022

A Component Registry for Kubeflow Pipelines - Christian Kadner, IBM

Kubeflow Pipelines are widely used to orchestrate machine learning (ML) workflows on Kubernetes. Pipelines and individual pipeline stages are often worked on collaboratively. To facilitate that process Kubeflow Pipelines support re-usable components, self-contained sets of code that performs one step in the ML workflow, like data preprocessing, data transformation, model training, and model serving. There is a rich set of components from community and vendors. What has been missing from the ecosystem however, is a registry for sharing reusable components with the public or among teams of data scientists. Thus many of the common tasks required to run ML workflows on Kubernetes like creating secrets, persistent volume claims, config maps have to be implemented again and again. A component registry can provide a rich catalog of components to solve those common tasks and ease the burden of creating ML workflows on Kubernetes.

1 participant
29 minutes

workflows

ai

kubeflow

lifecycle

process

preparation

kubernetes

models

stages

interface

19 May 2022

A Deep Dive into Kubeflow Pipelines - Senthil Raja Chermapandian, Ericsson

A Machine Learning model is only a tiny piece in a series of multiple processing steps executed as part of an ML workflow. A pipeline is a description of an ML workflow, including all the components in the workflow and how they combine in the form of a graph. Kubeflow Pipelines (KFP) is an open-source project that helps to run Cloud-native ML pipelines on Kubernetes. While most previous talks on KFP have focused on Data Scientists and Data Engineers, this talk will dive deep into KFP, covering its architecture, platform components and how the platform components work together in executing the workflow.

1 participant
32 minutes

kubeflow

kubernetes

pipelines

flow

pipelining

workflows

tensorflow

presentation

cube

framework

19 May 2022

Building AutoML Pipelines With Argo Workflows and Katib - Andrey Velichkevich, Apple + Johnu George, Nutanix

The fairly recent field of Automated Machine Learning (AutoML) provides the richness of powerful algorithms for model selection and hyperparameter (HP) tuning – one of the most important steps of the MLOps lifecycle. Katib is a popular Kubernetes native open source project to perform AutoML. Katib can tune HPs for models written in any framework such as Tensorflow, PyTorch, MXNet, and Scikit learn. To find the best HPs, metrics are evaluated after a model training step. Usually, the model training is a complex process which includes data preprocessing, data validation, actual training, and many more. This whole lifecycle can be represented by a workflow dependency graph by specifying dependencies between model operations. Argo Workflows provides a great container-native workflow engine to orchestrate jobs on Kubernetes, which makes it an ideal candidate for Katib Experiments. This talk will demonstrate how Argo Workflows natively integrates in Katib infrastructure.

2 participants
22 minutes

automated

automal

workflow

machine

automl

algorithms

advanced

analyzing

tensorflow

cluster

19 May 2022

Closing Remarks - Alex Collins, Intuit + Jessica Andersson, Annotell

2 participants
5 minutes

kubernetes

thank

ai

contributions

nexus

presenters

crew

everybody

tech

hiring

19 May 2022

Computer Vision Dog Breed Classification with Convolutional Neural Networks, TensorFlow and Kubeflow - Konstantinos Andriopoulos, Dorothea Kalliora, Arrikto

Sick of strangers at the dog park constantly commenting on how good looking your pup is, but at a loss when they ask, “What breed is it?” Me too! Why not use AI to answer the question for you? For data scientists looking for an open source and scalable way to tackle these sorts of problems, Kubernetes and Kubeflow make analyzing content in images and video much easier than trying to build everything from scratch and run it on bare metal or VMs. In this talk we’ll work through the development of a Notebook that leverages the combined powers of TensorFlow, ResNet-50 models, convolutional neural networks, VGG16, and transfer learning to see how accurately these algorithms can predict the breed of my dog. Spoiler alert! I have the genealogy results, so there will be a big reveal with DNA pitted against a variety of algorithms.

5 participants
29 minutes

kubeflow

workflow

keyflow

tensorflow

kubecon

qflow

kubernetes

kfb

lab

help

19 May 2022

Debugging Machine Learning on the Edge with MLExray - Michelle Nquyen, Stanford

4 participants
27 minutes

machine

monitoring

mlx

tensorflow

deploying

platform

android

observability

kubernetes

biases

19 May 2022

Efficient AutoML with Ludwig, Ray, and Nodeless Kubernetes - Anne Marie Holler, Elotl + Travis Addair, Predibase

The open-source platforms Ludwig and Ray make Deep Learning (DL) accessible to diverse users, by reducing complexity barriers to training, scaling, deploying, and serving DL models. Recently, Ludwig was extended to support AutoML, for tabular datasets (v0.4.1) and for text classification datasets (v0.5.0), using Ray Tune for hyperparameter search. In this talk, we discuss how Ludwig AutoML exploits heuristics developed using a set of training datasets to efficiently produce models for validation datasets. And we show how running Ludwig AutoML on cloud Kubernetes clusters, using Nodeless K8s to add right-sized GPU resources when they are needed and to remove them when not, reduces cost and operational overhead vs running directly on EC2.

3 participants
28 minutes

mlflow

ml

automation

automl

kubernetes

ludwig

efficient

advanced

abstraction

resources

19 May 2022

Enhancing the Performance Testing Process for gRPC Model Inferencing at Scale - Ted Chang, Paul Van Eck, IBM

Performance testing is a critical part of software development that helps us to identify bottlenecks early on and avoid costly crashes that impact operation. When it comes to thousands of machine learning models of many different formats and sizes, ensuring that users can perform inference on these models in reasonable time is paramount. In this session, we show how a Kubernetes cluster is set up with KServe's ModelMesh to enable the high-density deployment of models for gRPC inference. Then, we demonstrate how we load test several thousands of models, and how Prometheus and Grafana are used to illustrate and monitor key performance metrics.

4 participants
32 minutes

microservice

microservices

deploying

kubernetes

providers

capacity

gig

models

automation

cluster

19 May 2022

Exploring ML Model Serving with KServe (with fun drawings) - Alexa Nicole Griffith, Bloomberg

KServe (formerly known as KFServing) provides an easy-to-use platform for deploying machine learning (ML) models. KServe is built on top of Kubernetes and provides performant, high abstraction interfaces that allow data scientists to spend more time focusing on building new models, and less time worrying about the underlying infrastructure. This open source project provides a simple, pluggable solution for common infrastructure issues with inference models, like GPU scaling and ModelMesh serving for high volume/density use cases. From the perspective of an eager engineer new to the KServe community, we will explore the KServe features that solve common issues for engineers and data scientists who are interested in or responsible for machine learning model deployment. Expect to learn about KServe’s fundamental offerings, like out-of-the box model serving and monitoring, and its exciting new, advanced functionalities, such as its inference graph capabilities and ModelMesh features. We will discuss the host of new features added to the project since its publication in 2019 and also outline KServe’s roadmap as it moves forward towards its v1.0 release.

5 participants
28 minutes

serving

kf

kubernetes

knowledge

sklearn

hi

pod

kso

protocol

bloomberg

19 May 2022

Kubernetes + AI Joining Forces in the Battle Against Cancer - Wojciech Małota-Wójcik, Ridge

A doctor is able to treat one patient at a time. On the other hand, engineers may create software analyzing thousands of cases every day! This presentation will focus on how geographically distributed Kubernetes clusters and AI are indispensable tools for IT professionals and computational biologists as they join forces to battle cancer. Computational biology — combining AI, medicine, mathematics, statistics, IoT and cloud computing — is increasing the precision and capacity of diagnostic processes. This evolution requires petabytes of storage, low-latency networking, efficient GPUs and ease of deployment on a massive scale. This leads us directly to the growing need for highly geographically distributed Kubernetes clusters, running as close to the hospital as possible. The presentation will review the basics of processing cancer images and then show actual examples of how AI algorithms are developed and deployed on K8s clusters and used by doctors to perform life-saving treatments.

1 participant
26 minutes

cancer

tomorrow

patients

taking

ai

presenting

researchers

prostate

tomography

advanced

19 May 2022

Managing Multi-Cloud Apache Spark on Kubernetes - Ilan Filonenko, Aki Sukegawa, Bloomberg

Bloomberg has built multi-cloud quant platforms on top of Kubernetes to enable its users to develop sophisticated financial applications with integrated first-class data science capabilities. In this journey, it quickly became clear that managing data science infrastructure in a multi-cloud environment is challenging, especially when it comes to Apache Spark. While Kubernetes provides an excellent abstraction for designing composable infrastructure substrates, it comes with a list of challenges when dealing with auto-scaling, scheduling, preemption, and security. Given these challenges, this talk will explore how one can effectively manage an expansive Spark infrastructure solution that spans bare-metal and multiple public cloud platforms. We will also walk through various observability strategies, primarily focusing on how to surface cluster information to a varied group of Spark end-users by leveraging a variety of native Kubernetes resources, like node autoscalers, controllers, and custom PodConditions.

5 participants
31 minutes

kubernetes

sparkup

cloud

troubleshooting

server

providers

dependencies

apache

interface

gpu

19 May 2022

Sponsored Keynote: Challenges and Opportunities in Making AI Easy and Efficient with Kuberenetes - Maulin Patel, Google

1 participant
10 minutes

gpu

kubernetes

ai

cpus

efficient

workloads

provisioning

scalability

reasons

aiml

Cloud Native Computing Foundation / Kubernetes AI Day EU 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022

19 May 2022