youtube image
From YouTube: Sponsor Demo: AWS - TorchElastic Controller for Kubernetes

Description

Amazon EKS has quickly emerged as a leading choice for machine learning workloads. In this session, we’ll walk through some of the recent ML related enhancements the Kubernetes team at AWS has released. We will then dive deep with a walkthrough of the TorchElastic Controller for Kubernetes, a new open source collaboration between the Kubernetes team at AWS and the PyTorch team at Facebook, the TorchElastic Controller for Kubernetes, which addresses these limitations and unlocks new capabilities with PyTorch built models and Kubernetes distributed training, including the ability to train on EC2 Spot instances, run jobs that are resilient to hardware failures, and dynamically scale jobs based on cluster resource availability. For more information, read the launch blog: https://aws.amazon.com/blogs/containers/fault-tolerant-distributed-machine-learning-training-with-the-torchelastic-controller-for-kubernetes/