youtube image
From YouTube: Is There a Place For Distributed Storage For AI/ML on Kubernetes? - Diane Feddema & Kyle Bader

Description

Don’t miss out! Join us at our upcoming events: EnvoyCon Virtual on October 15 and KubeCon + CloudNativeCon North America 2020 Virtual from November 17-20. Learn more at https://kubecon.io. The conferences feature presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Is There a Place For Distributed Storage For AI/ML on Kubernetes? - Diane Feddema & Kyle Bader, Red Hat

Containerized machine learning workloads running on Kubernetes receive benefits such as portability, declarative configuration, less administrative toil, all with marginal performance impact. The best published results for performance sensitive machine learning workloads, e.g. MLPerf v0.6, were obtained by reading the datasets from local SSDs. While the MLPerf datasets fit comfortably on a single SSD, it’s a luxury not afforded to folks training models against petabyte scale datasets. We’ll share our experience running MLPerf training jobs in Kubernetes, against datasets stored by Kubernetes stateful storage services orchestrated by Rook. Highlights include the performance and scalability tradeoffs associated with local and open source distributed storage, and how machine learning formats like RecordIO and TFRecord provide performance utility and model validation flexibility.

https://sched.co/ZerS