youtube image
From YouTube: Building Armada – Running Batch Jobs at Massive Scale on Kubernetes - Jamie Poole, G-Research

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Building Armada – Running Batch Jobs at Massive Scale on Kubernetes - Jamie Poole, G-Research

Thousands of GPUs. Hundreds of thousands of CPUs. Learn how (and why!) G-Research designed and built Armada - a system to enable massive throughput of batch jobs running on Kubernetes. In this session you’ll hear how we use large scale batch compute on Kubernetes to spot patterns in financial markets and predict the future. Armada enables us to schedule millions of batch jobs across many clusters and tens of thousands of nodes, getting optimum utilisation of our hardware to enable our researchers to run the latest machine-learning and advanced data science techniques across vast datasets. We’ll cover the architecture and approach of Armada, challenges and techniques for running Kubernetes at scale and some war stories and lessons learned along the way.