Cloud Native Computing Foundation / Kubernetes AI Day NA 2022

Add meeting Rate page Subscribe

Cloud Native Computing Foundation / Kubernetes AI Day NA 2022

These are all the meetings we have in "Kubernetes AI Day NA…" (part of the organization "Cloud Native Computi…"). Click into individual meeting pages to watch the recording and search or read the transcript.

15 Nov 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Feathr, an Open Source, Battletested, Scalable and Enterprise-Grade Feature Store - Hangfei Lin, LinkedIn

In this presentation, we will present an open-source feature store called Feathr. Feathr is the feature store that has been used in production and battle tested in LinkedIn for over 5 years, serving all the LinkedIn machine learning feature platform. In this talk, we will cover the background of Feathr, its internal design and philosophy, and our journey on building an enterprise feature store. Some of Feathr's highlights include: Native feature transformation support, including feature aggregation, sliding window joins, feature lookup, etc.; it is 30X faster running time with bloom filters, join plan optimizer, salted join, and other optimizations. It is also cloud native with simplified and scalable architecture.
  • 5 participants
  • 33 minutes
collaborating
feather
thanks
question
keynote
shirt
thiago
bring
thinking
tribal
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Are You Really Out of GPUs? How to Better Understand Your GPU Utilization - Natasha Romm & Raz Rotenberg, Run:AI

GPUs are in high demand, and we’re always running out of them - there’s just not enough of them in our cluster! But they are also expensive, so we can’t easily buy and add more and more of them to the resource pool. We could learn how to better utilize them, but before we even do that, we need to get a better visualization of our existing GPU utilization. This talk will explore some easy-to-use open-source tools to help you understand the GPU utilization in your Kubernetes cluster, and will help you make better decisions regarding your AI workload assignment on GPUs.
  • 10 participants
  • 32 minutes
gp
gpu
provisioning
manages
utilization
advanced
ai
profiling
models
problems
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Building a Batch Processing Platform for Running Data Pipelines Using Argo & Kubernetes - Rakesh Subramanian Suresh & Aroop Maliakkal Padmanabhan, Intuit

Intuit has built a highly scalable batch processing platform with Kubernetes and Argo to enable data engineers to easily deploy, manage, and schedule data pipelines. With hundreds of AI & Data engineering teams managing over 100,000 data pipelines, pipeline deployments have many challenges, including scheduling, orchestration, and managing complex dependencies to eliminate the silos and increase processing effectiveness in the data lake. While there are solutions to these challenges independently, there isn’t one that holistically solves scheduling, pipeline dependency management, and infrastructure deployment and orchestration. In this talk, we will discuss utilizing Argo Events, Argo Workflow, and Kubernetes to build and effectively manage an orchestration and scheduling engine for running various data processing use cases. Besides, we will also cover the learnings and operational challenges of managing this multi-cluster Kubernetes infrastructure and how Argo can be integrated with Kafka for zero downtime scheduling.
  • 7 participants
  • 29 minutes
workflow
process
intuit
provisioning
processors
platform
batch
operational
bot
managed
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Building a Data Science Platform with Argo Workflows - J.P. Zivalich, Pipekit & Will Wang, Bloomberg

How do you automate thousands of data pipelines for a dozen data teams, each with their own pipeline requirements and use-cases, all deploying to multiple clusters in multiple regions? Bloomberg and Pipekit have taken their own unique approaches to solving data operations at scale using Argo Workflows and a multi-cluster Kubernetes architecture. Learn how these Argo users are creating reliable and scalable data science platforms that enable data scientists to self-serve their data engineering needs, from creating DAGs to analyzing logs and metrics.
  • 3 participants
  • 27 minutes
workflow
workflows
pipekit
tooling
infrastructure
process
kubernetes
application
api
docker
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

DevOps in Data Science: What Works and What Doesn’t - Chase Christensen & Stefano Fioravanzo, Arrikto

We all want more models and better business buy-ins from our ML projects. Often before questioning quality, we begin spending far too much time engineering overly ambitious model development life cycles. When it comes to productizing a model, ML engineers want DevOps. Data Scientists want simplicity. This often leads to tension and technical debt.

Our goal was to leverage Kubeflow (the most widely used and mature OSS MLOps platform) to “shift left” by giving data scientists the power to leverage Kaniko to self-service build containers using Kubeflow Pipelines. ArgoCD was used to deliver the models. We failed and we needed a DevOps detox. The CI process we imagined was complex and didn’t serve the data scientists in a meaningful way. We will discuss why Kserve is a lightweight and production-ready solution that can improve the outcomes we initially sought with Kaniko and KFP and how we as engineers can improve the OSS MLOps community.
  • 2 participants
  • 22 minutes
devopsy
kubeflow
kubernetes
workflow
developers
initiatives
scientists
users
orgs
data
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

How Bloomberg Uses AI to Help Speed up Over-the-Counter Trading - Camilo Ortiz & Philipp Meerkamp

Did you know that trillions of dollars in securities are traded over-the-counter each year as a result of interactive conversations related to negotiated financial instrument transactions – also known as Dialogue Acts – conducted by traders using Bloomberg’s messaging tools (group chats and emails)? In this talk, we will describe how Bloomberg’s AI Engineering team builds and maintains services that detect offers of financial instruments in group chats to help traders who have requested to implement the add-on features that use these machine learning models to be more efficient in their negotiations. In particular, we will show how Kubernetes custom controllers – KServe and Kubeflow training operators – are critical in several steps of the model development life cycle (MDLC), including sampling of relevant data, training and distillation of models, A/B testing, and production deployments.
  • 4 participants
  • 24 minutes
securities
markets
transactions
kubernetes
trading
bank
bloomberg
etfs
clients
discussed
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Keynote: Building Enterprise AI/ML Platform on Kubernetes - Maulin Patel, Group Product Manager, Google

AI/ML practitioners are increasingly leveraging Kubernetes for data processing, training and inference needs. Kubernetes is well suited to address their needs given its support for auto-provisioning, auto-scaling and various machine types (e.g. CPU, GPU, TPU). AI/ML practitioners also benefit from Kubernetes dynamic scheduling, high availability, job API, portability, customizability and fault tolerance capabilities. As such many organizations are standardizing their AI/ML Platform on Kubernetes. In this talk, we will share the best practices for building AI/ML platforms on Kubernetes.
  • 2 participants
  • 12 minutes
kubernetes
workflow
machine
platform
manager
architectures
provisioning
motivations
devops
expertise
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Next-Generation Data Science Workflows Using Ray - Erik Erlandson, Red Hat

Learn how to create distributed and cloud native data science workflows with Ray on Kubernetes. Ray is rapidly gaining momentum as an open source parallel computing platform that provides a scale-out cluster model inspired by tools such as Spark and Flink, yet also supports a lightweight scale-to-zero Serverless style workflow that is designed natively for modern container platforms in the Kubernetes ecosystem. Ray implements a constellation of tools that support data science and devops activities ranging from ETL, feature extraction, model training, ML pipelines, all the way through serverless inferencing. In this talk, Erik and Michael will discuss deploying Ray onto Kubernetes and integrating it with JupyterHub to create distributed and cloud native data science workflows. They will demonstrate Ray in action, running an end-to-end data science project on Kubernetes. Attendees will learn how to leverage the capabilities of Ray to do cloud native data science.
  • 2 participants
  • 31 minutes
ray
programming
workflow
abstractions
parallelizable
gpu
graph
services
ai
transitioning
youtube image

28 Oct 2022

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

On Data Locality in Kubernetes - Chen Wang, IBM Research & Shouwei Chen , Alluxio

While Kubernetes has made it exceptionally easy to deploy and scale data-intensive applications elastically, accessing data from cloud-native data sources (like AWS S3 or sometimes remote data warehouses) becomes more challenging. Platform engineers often have to copy data to optimize the I/O throughput, which is error-prone and time-consuming. As the Kubernetes ecosystem matures and becomes more efficient, this challenge gets more imperative to address, and different attempts are being made to bring back data locality and influence workload scheduling. In this talk, we will discuss the pros and cons of different approaches to emulate or introduce data locality in Kubernetes schedulers. We believe this will become crucial for Kubernetes to achieve higher efficiency in the near future for data-intensive workloads.
  • 6 participants
  • 29 minutes
kubernetes
ai
infrastructure
microservices
research
workflows
ml
cluster
manage
tensorflow
youtube image