Cloud Native Computing Foundation Kubernetes AI Day NA 2022 Open Meetings

15 Nov 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Feathr, an Open Source, Battletested, Scalable and Enterprise-Grade Feature Store - Hangfei Lin, LinkedIn

In this presentation, we will present an open-source feature store called Feathr. Feathr is the feature store that has been used in production and battle tested in LinkedIn for over 5 years, serving all the LinkedIn machine learning feature platform. In this talk, we will cover the background of Feathr, its internal design and philosophy, and our journey on building an enterprise feature store. Some of Feathr's highlights include: Native feature transformation support, including feature aggregation, sliding window joins, feature lookup, etc.; it is 30X faster running time with bloom filters, join plan optimizer, salted join, and other optimizations. It is also cloud native with simplified and scalable architecture.

5 participants
33 minutes

collaborating

feather

thanks

question

keynote

shirt

thiago

bring

thinking

tribal

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Are You Really Out of GPUs? How to Better Understand Your GPU Utilization - Natasha Romm & Raz Rotenberg, Run:AI

GPUs are in high demand, and we’re always running out of them - there’s just not enough of them in our cluster! But they are also expensive, so we can’t easily buy and add more and more of them to the resource pool. We could learn how to better utilize them, but before we even do that, we need to get a better visualization of our existing GPU utilization. This talk will explore some easy-to-use open-source tools to help you understand the GPU utilization in your Kubernetes cluster, and will help you make better decisions regarding your AI workload assignment on GPUs.

10 participants
32 minutes

gp

gpu

provisioning

manages

utilization

advanced

ai

profiling

models

problems

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Building a Batch Processing Platform for Running Data Pipelines Using Argo & Kubernetes - Rakesh Subramanian Suresh & Aroop Maliakkal Padmanabhan, Intuit

Intuit has built a highly scalable batch processing platform with Kubernetes and Argo to enable data engineers to easily deploy, manage, and schedule data pipelines. With hundreds of AI & Data engineering teams managing over 100,000 data pipelines, pipeline deployments have many challenges, including scheduling, orchestration, and managing complex dependencies to eliminate the silos and increase processing effectiveness in the data lake. While there are solutions to these challenges independently, there isn’t one that holistically solves scheduling, pipeline dependency management, and infrastructure deployment and orchestration. In this talk, we will discuss utilizing Argo Events, Argo Workflow, and Kubernetes to build and effectively manage an orchestration and scheduling engine for running various data processing use cases. Besides, we will also cover the learnings and operational challenges of managing this multi-cluster Kubernetes infrastructure and how Argo can be integrated with Kafka for zero downtime scheduling.

7 participants
29 minutes

workflow

process

intuit

provisioning

processors

platform

batch

operational

bot

managed

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Building a Data Science Platform with Argo Workflows - J.P. Zivalich, Pipekit & Will Wang, Bloomberg

How do you automate thousands of data pipelines for a dozen data teams, each with their own pipeline requirements and use-cases, all deploying to multiple clusters in multiple regions? Bloomberg and Pipekit have taken their own unique approaches to solving data operations at scale using Argo Workflows and a multi-cluster Kubernetes architecture. Learn how these Argo users are creating reliable and scalable data science platforms that enable data scientists to self-serve their data engineering needs, from creating DAGs to analyzing logs and metrics.

3 participants
27 minutes

workflow

workflows

pipekit

tooling

infrastructure

process

kubernetes

application

api

docker

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

DevOps in Data Science: What Works and What Doesn’t - Chase Christensen & Stefano Fioravanzo, Arrikto

We all want more models and better business buy-ins from our ML projects. Often before questioning quality, we begin spending far too much time engineering overly ambitious model development life cycles. When it comes to productizing a model, ML engineers want DevOps. Data Scientists want simplicity. This often leads to tension and technical debt.

Our goal was to leverage Kubeflow (the most widely used and mature OSS MLOps platform) to “shift left” by giving data scientists the power to leverage Kaniko to self-service build containers using Kubeflow Pipelines. ArgoCD was used to deliver the models. We failed and we needed a DevOps detox. The CI process we imagined was complex and didn’t serve the data scientists in a meaningful way. We will discuss why Kserve is a lightweight and production-ready solution that can improve the outcomes we initially sought with Kaniko and KFP and how we as engineers can improve the OSS MLOps community.

2 participants
22 minutes

devopsy

kubeflow

kubernetes

workflow

developers

initiatives

scientists

users

orgs

data

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

How Bloomberg Uses AI to Help Speed up Over-the-Counter Trading - Camilo Ortiz & Philipp Meerkamp

Did you know that trillions of dollars in securities are traded over-the-counter each year as a result of interactive conversations related to negotiated financial instrument transactions – also known as Dialogue Acts – conducted by traders using Bloomberg’s messaging tools (group chats and emails)? In this talk, we will describe how Bloomberg’s AI Engineering team builds and maintains services that detect offers of financial instruments in group chats to help traders who have requested to implement the add-on features that use these machine learning models to be more efficient in their negotiations. In particular, we will show how Kubernetes custom controllers – KServe and Kubeflow training operators – are critical in several steps of the model development life cycle (MDLC), including sampling of relevant data, training and distillation of models, A/B testing, and production deployments.

4 participants
24 minutes

securities

markets

transactions

kubernetes

trading

bank

bloomberg

etfs

clients

discussed

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Keynote: Building Enterprise AI/ML Platform on Kubernetes - Maulin Patel, Group Product Manager, Google

AI/ML practitioners are increasingly leveraging Kubernetes for data processing, training and inference needs. Kubernetes is well suited to address their needs given its support for auto-provisioning, auto-scaling and various machine types (e.g. CPU, GPU, TPU). AI/ML practitioners also benefit from Kubernetes dynamic scheduling, high availability, job API, portability, customizability and fault tolerance capabilities. As such many organizations are standardizing their AI/ML Platform on Kubernetes. In this talk, we will share the best practices for building AI/ML platforms on Kubernetes.

2 participants
12 minutes

kubernetes

workflow

machine

platform

manager

architectures

provisioning

motivations

devops

expertise

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Next-Generation Data Science Workflows Using Ray - Erik Erlandson, Red Hat

Learn how to create distributed and cloud native data science workflows with Ray on Kubernetes. Ray is rapidly gaining momentum as an open source parallel computing platform that provides a scale-out cluster model inspired by tools such as Spark and Flink, yet also supports a lightweight scale-to-zero Serverless style workflow that is designed natively for modern container platforms in the Kubernetes ecosystem. Ray implements a constellation of tools that support data science and devops activities ranging from ETL, feature extraction, model training, ML pipelines, all the way through serverless inferencing. In this talk, Erik and Michael will discuss deploying Ray onto Kubernetes and integrating it with JupyterHub to create distributed and cloud native data science workflows. They will demonstrate Ray in action, running an end-to-end data science project on Kubernetes. Attendees will learn how to leverage the capabilities of Ray to do cloud native data science.

2 participants
31 minutes

ray

programming

workflow

abstractions

parallelizable

gpu

graph

services

ai

transitioning

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

On Data Locality in Kubernetes - Chen Wang, IBM Research & Shouwei Chen , Alluxio

While Kubernetes has made it exceptionally easy to deploy and scale data-intensive applications elastically, accessing data from cloud-native data sources (like AWS S3 or sometimes remote data warehouses) becomes more challenging. Platform engineers often have to copy data to optimize the I/O throughput, which is error-prone and time-consuming. As the Kubernetes ecosystem matures and becomes more efficient, this challenge gets more imperative to address, and different attempts are being made to bring back data locality and influence workload scheduling. In this talk, we will discuss the pros and cons of different approaches to emulate or introduce data locality in Kubernetes schedulers. We believe this will become crucial for Kubernetes to achieve higher efficiency in the near future for data-intensive workloads.

6 participants
29 minutes

kubernetes

ai

infrastructure

microservices

research

workflows

ml

cluster

manage

tensorflow

Cloud Native Computing Foundation / Kubernetes AI Day NA 2022

15 Nov 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022

28 Oct 2022