youtube image
From YouTube: Taming Data/State Challenges for ML Applications and Kubeflow - Skyler Thomas, Hewlett Packard

Description

Don’t miss out! Join us at our upcoming events: EnvoyCon Virtual on October 15 and KubeCon + CloudNativeCon North America 2020 Virtual from November 17-20. Learn more at https://kubecon.io. The conferences feature presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Taming Data/State Challenges for ML Applications and Kubeflow - Skyler Thomas, Hewlett Packard Enterprise

The Kubeflow project brings incredibly powerful Machine Learning frameworks like TensorFlow and PyTorch to Kubernetes. The ability to parallelize training and the ability to scale workflows up and down is revolutionary. However, state and persistent storage are a much bigger challenge for machine learning workloads because of their training data, library files, and models. We will discuss what it took to create AI/ML environments running thousands of pods and that request petabytes of training data. We will explore the various state and storage challenges that crop up when you are building Kubeflow applications. We will discuss where distributed persistent storage solutions fit in the picture. We will address various storage api's including: POSIX/CSI solutions, NFS, S3, and HDFS fit into solutions. Data security and privacy issues will be discussed.

https://sched.co/Zeq3