15 Nov 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from April 17-21, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Feathr, an Open Source, Battletested, Scalable and Enterprise-Grade Feature Store - Hangfei Lin, LinkedIn
In this presentation, we will present an open-source feature store called Feathr. Feathr is the feature store that has been used in production and battle tested in LinkedIn for over 5 years, serving all the LinkedIn machine learning feature platform. In this talk, we will cover the background of Feathr, its internal design and philosophy, and our journey on building an enterprise feature store. Some of Feathr's highlights include: Native feature transformation support, including feature aggregation, sliding window joins, feature lookup, etc.; it is 30X faster running time with bloom filters, join plan optimizer, salted join, and other optimizations. It is also cloud native with simplified and scalable architecture.
Feathr, an Open Source, Battletested, Scalable and Enterprise-Grade Feature Store - Hangfei Lin, LinkedIn
In this presentation, we will present an open-source feature store called Feathr. Feathr is the feature store that has been used in production and battle tested in LinkedIn for over 5 years, serving all the LinkedIn machine learning feature platform. In this talk, we will cover the background of Feathr, its internal design and philosophy, and our journey on building an enterprise feature store. Some of Feathr's highlights include: Native feature transformation support, including feature aggregation, sliding window joins, feature lookup, etc.; it is 30X faster running time with bloom filters, join plan optimizer, salted join, and other optimizations. It is also cloud native with simplified and scalable architecture.
- 5 participants
- 33 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Are You Really Out of GPUs? How to Better Understand Your GPU Utilization - Natasha Romm & Raz Rotenberg, Run:AI
GPUs are in high demand, and we’re always running out of them - there’s just not enough of them in our cluster! But they are also expensive, so we can’t easily buy and add more and more of them to the resource pool. We could learn how to better utilize them, but before we even do that, we need to get a better visualization of our existing GPU utilization. This talk will explore some easy-to-use open-source tools to help you understand the GPU utilization in your Kubernetes cluster, and will help you make better decisions regarding your AI workload assignment on GPUs.
Are You Really Out of GPUs? How to Better Understand Your GPU Utilization - Natasha Romm & Raz Rotenberg, Run:AI
GPUs are in high demand, and we’re always running out of them - there’s just not enough of them in our cluster! But they are also expensive, so we can’t easily buy and add more and more of them to the resource pool. We could learn how to better utilize them, but before we even do that, we need to get a better visualization of our existing GPU utilization. This talk will explore some easy-to-use open-source tools to help you understand the GPU utilization in your Kubernetes cluster, and will help you make better decisions regarding your AI workload assignment on GPUs.
- 10 participants
- 32 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Building a Batch Processing Platform for Running Data Pipelines Using Argo & Kubernetes - Rakesh Subramanian Suresh & Aroop Maliakkal Padmanabhan, Intuit
Intuit has built a highly scalable batch processing platform with Kubernetes and Argo to enable data engineers to easily deploy, manage, and schedule data pipelines. With hundreds of AI & Data engineering teams managing over 100,000 data pipelines, pipeline deployments have many challenges, including scheduling, orchestration, and managing complex dependencies to eliminate the silos and increase processing effectiveness in the data lake. While there are solutions to these challenges independently, there isn’t one that holistically solves scheduling, pipeline dependency management, and infrastructure deployment and orchestration. In this talk, we will discuss utilizing Argo Events, Argo Workflow, and Kubernetes to build and effectively manage an orchestration and scheduling engine for running various data processing use cases. Besides, we will also cover the learnings and operational challenges of managing this multi-cluster Kubernetes infrastructure and how Argo can be integrated with Kafka for zero downtime scheduling.
Building a Batch Processing Platform for Running Data Pipelines Using Argo & Kubernetes - Rakesh Subramanian Suresh & Aroop Maliakkal Padmanabhan, Intuit
Intuit has built a highly scalable batch processing platform with Kubernetes and Argo to enable data engineers to easily deploy, manage, and schedule data pipelines. With hundreds of AI & Data engineering teams managing over 100,000 data pipelines, pipeline deployments have many challenges, including scheduling, orchestration, and managing complex dependencies to eliminate the silos and increase processing effectiveness in the data lake. While there are solutions to these challenges independently, there isn’t one that holistically solves scheduling, pipeline dependency management, and infrastructure deployment and orchestration. In this talk, we will discuss utilizing Argo Events, Argo Workflow, and Kubernetes to build and effectively manage an orchestration and scheduling engine for running various data processing use cases. Besides, we will also cover the learnings and operational challenges of managing this multi-cluster Kubernetes infrastructure and how Argo can be integrated with Kafka for zero downtime scheduling.
- 7 participants
- 29 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Building a Data Science Platform with Argo Workflows - J.P. Zivalich, Pipekit & Will Wang, Bloomberg
How do you automate thousands of data pipelines for a dozen data teams, each with their own pipeline requirements and use-cases, all deploying to multiple clusters in multiple regions? Bloomberg and Pipekit have taken their own unique approaches to solving data operations at scale using Argo Workflows and a multi-cluster Kubernetes architecture. Learn how these Argo users are creating reliable and scalable data science platforms that enable data scientists to self-serve their data engineering needs, from creating DAGs to analyzing logs and metrics.
Building a Data Science Platform with Argo Workflows - J.P. Zivalich, Pipekit & Will Wang, Bloomberg
How do you automate thousands of data pipelines for a dozen data teams, each with their own pipeline requirements and use-cases, all deploying to multiple clusters in multiple regions? Bloomberg and Pipekit have taken their own unique approaches to solving data operations at scale using Argo Workflows and a multi-cluster Kubernetes architecture. Learn how these Argo users are creating reliable and scalable data science platforms that enable data scientists to self-serve their data engineering needs, from creating DAGs to analyzing logs and metrics.
- 3 participants
- 27 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
DevOps in Data Science: What Works and What Doesn’t - Chase Christensen & Stefano Fioravanzo, Arrikto
We all want more models and better business buy-ins from our ML projects. Often before questioning quality, we begin spending far too much time engineering overly ambitious model development life cycles. When it comes to productizing a model, ML engineers want DevOps. Data Scientists want simplicity. This often leads to tension and technical debt.
Our goal was to leverage Kubeflow (the most widely used and mature OSS MLOps platform) to “shift left” by giving data scientists the power to leverage Kaniko to self-service build containers using Kubeflow Pipelines. ArgoCD was used to deliver the models. We failed and we needed a DevOps detox. The CI process we imagined was complex and didn’t serve the data scientists in a meaningful way. We will discuss why Kserve is a lightweight and production-ready solution that can improve the outcomes we initially sought with Kaniko and KFP and how we as engineers can improve the OSS MLOps community.
DevOps in Data Science: What Works and What Doesn’t - Chase Christensen & Stefano Fioravanzo, Arrikto
We all want more models and better business buy-ins from our ML projects. Often before questioning quality, we begin spending far too much time engineering overly ambitious model development life cycles. When it comes to productizing a model, ML engineers want DevOps. Data Scientists want simplicity. This often leads to tension and technical debt.
Our goal was to leverage Kubeflow (the most widely used and mature OSS MLOps platform) to “shift left” by giving data scientists the power to leverage Kaniko to self-service build containers using Kubeflow Pipelines. ArgoCD was used to deliver the models. We failed and we needed a DevOps detox. The CI process we imagined was complex and didn’t serve the data scientists in a meaningful way. We will discuss why Kserve is a lightweight and production-ready solution that can improve the outcomes we initially sought with Kaniko and KFP and how we as engineers can improve the OSS MLOps community.
- 2 participants
- 22 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
How Bloomberg Uses AI to Help Speed up Over-the-Counter Trading - Camilo Ortiz & Philipp Meerkamp
Did you know that trillions of dollars in securities are traded over-the-counter each year as a result of interactive conversations related to negotiated financial instrument transactions – also known as Dialogue Acts – conducted by traders using Bloomberg’s messaging tools (group chats and emails)? In this talk, we will describe how Bloomberg’s AI Engineering team builds and maintains services that detect offers of financial instruments in group chats to help traders who have requested to implement the add-on features that use these machine learning models to be more efficient in their negotiations. In particular, we will show how Kubernetes custom controllers – KServe and Kubeflow training operators – are critical in several steps of the model development life cycle (MDLC), including sampling of relevant data, training and distillation of models, A/B testing, and production deployments.
How Bloomberg Uses AI to Help Speed up Over-the-Counter Trading - Camilo Ortiz & Philipp Meerkamp
Did you know that trillions of dollars in securities are traded over-the-counter each year as a result of interactive conversations related to negotiated financial instrument transactions – also known as Dialogue Acts – conducted by traders using Bloomberg’s messaging tools (group chats and emails)? In this talk, we will describe how Bloomberg’s AI Engineering team builds and maintains services that detect offers of financial instruments in group chats to help traders who have requested to implement the add-on features that use these machine learning models to be more efficient in their negotiations. In particular, we will show how Kubernetes custom controllers – KServe and Kubeflow training operators – are critical in several steps of the model development life cycle (MDLC), including sampling of relevant data, training and distillation of models, A/B testing, and production deployments.
- 4 participants
- 24 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Keynote: Building Enterprise AI/ML Platform on Kubernetes - Maulin Patel, Group Product Manager, Google
AI/ML practitioners are increasingly leveraging Kubernetes for data processing, training and inference needs. Kubernetes is well suited to address their needs given its support for auto-provisioning, auto-scaling and various machine types (e.g. CPU, GPU, TPU). AI/ML practitioners also benefit from Kubernetes dynamic scheduling, high availability, job API, portability, customizability and fault tolerance capabilities. As such many organizations are standardizing their AI/ML Platform on Kubernetes. In this talk, we will share the best practices for building AI/ML platforms on Kubernetes.
Keynote: Building Enterprise AI/ML Platform on Kubernetes - Maulin Patel, Group Product Manager, Google
AI/ML practitioners are increasingly leveraging Kubernetes for data processing, training and inference needs. Kubernetes is well suited to address their needs given its support for auto-provisioning, auto-scaling and various machine types (e.g. CPU, GPU, TPU). AI/ML practitioners also benefit from Kubernetes dynamic scheduling, high availability, job API, portability, customizability and fault tolerance capabilities. As such many organizations are standardizing their AI/ML Platform on Kubernetes. In this talk, we will share the best practices for building AI/ML platforms on Kubernetes.
- 2 participants
- 12 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Next-Generation Data Science Workflows Using Ray - Erik Erlandson, Red Hat
Learn how to create distributed and cloud native data science workflows with Ray on Kubernetes. Ray is rapidly gaining momentum as an open source parallel computing platform that provides a scale-out cluster model inspired by tools such as Spark and Flink, yet also supports a lightweight scale-to-zero Serverless style workflow that is designed natively for modern container platforms in the Kubernetes ecosystem. Ray implements a constellation of tools that support data science and devops activities ranging from ETL, feature extraction, model training, ML pipelines, all the way through serverless inferencing. In this talk, Erik and Michael will discuss deploying Ray onto Kubernetes and integrating it with JupyterHub to create distributed and cloud native data science workflows. They will demonstrate Ray in action, running an end-to-end data science project on Kubernetes. Attendees will learn how to leverage the capabilities of Ray to do cloud native data science.
Next-Generation Data Science Workflows Using Ray - Erik Erlandson, Red Hat
Learn how to create distributed and cloud native data science workflows with Ray on Kubernetes. Ray is rapidly gaining momentum as an open source parallel computing platform that provides a scale-out cluster model inspired by tools such as Spark and Flink, yet also supports a lightweight scale-to-zero Serverless style workflow that is designed natively for modern container platforms in the Kubernetes ecosystem. Ray implements a constellation of tools that support data science and devops activities ranging from ETL, feature extraction, model training, ML pipelines, all the way through serverless inferencing. In this talk, Erik and Michael will discuss deploying Ray onto Kubernetes and integrating it with JupyterHub to create distributed and cloud native data science workflows. They will demonstrate Ray in action, running an end-to-end data science project on Kubernetes. Attendees will learn how to leverage the capabilities of Ray to do cloud native data science.
- 2 participants
- 31 minutes
28 Oct 2022
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
On Data Locality in Kubernetes - Chen Wang, IBM Research & Shouwei Chen , Alluxio
While Kubernetes has made it exceptionally easy to deploy and scale data-intensive applications elastically, accessing data from cloud-native data sources (like AWS S3 or sometimes remote data warehouses) becomes more challenging. Platform engineers often have to copy data to optimize the I/O throughput, which is error-prone and time-consuming. As the Kubernetes ecosystem matures and becomes more efficient, this challenge gets more imperative to address, and different attempts are being made to bring back data locality and influence workload scheduling. In this talk, we will discuss the pros and cons of different approaches to emulate or introduce data locality in Kubernetes schedulers. We believe this will become crucial for Kubernetes to achieve higher efficiency in the near future for data-intensive workloads.
On Data Locality in Kubernetes - Chen Wang, IBM Research & Shouwei Chen , Alluxio
While Kubernetes has made it exceptionally easy to deploy and scale data-intensive applications elastically, accessing data from cloud-native data sources (like AWS S3 or sometimes remote data warehouses) becomes more challenging. Platform engineers often have to copy data to optimize the I/O throughput, which is error-prone and time-consuming. As the Kubernetes ecosystem matures and becomes more efficient, this challenge gets more imperative to address, and different attempts are being made to bring back data locality and influence workload scheduling. In this talk, we will discuss the pros and cons of different approaches to emulate or introduce data locality in Kubernetes schedulers. We believe this will become crucial for Kubernetes to achieve higher efficiency in the near future for data-intensive workloads.
- 6 participants
- 29 minutes