youtube image
From YouTube: Lightning Talk: Fluence: Approaching a Converged Computing Environm...Daniel Milroy & Claudia Misale

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io​. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

Lightning Talk: Fluence: Approaching a Converged Computing Environment - Daniel Milroy, Lawrence Livermore National Laboratory & Claudia Misale, IBM T.J. Watson Research Center

Adoption of cloud technologies by high performance computing (HPC) is accelerating, and HPC users want their applications to perform well everywhere. While container orchestration provides resiliency, elasticity, and declarative management, it is not designed to enable app performance like HPC schedulers. In particular, Kube-scheduler is not suited to scheduling emerging HPC workflows that require pods placed advantageously. In response to interest in scheduling flexibility, the K8s community developed the Scheduling Framework to integrate new policies and schedulers. KubeFlux, a Scheduling Framework plugin based on the Fluxion open-source HPC scheduler, provides HPC scheduling capability in K8s. We detail our improvements to the MPI Operator and demonstrate its scalability to 16,384 ranks. With the improved operator we compare the performance of HPC benchmark apps scheduled by Kube-scheduler and KubeFlux. We conclude that KubeFlux makes pod placements that enable much higher app performance than Kube-scheduler. KubeFlux is an example of the rich capability that can be added to K8s and paves the way to converged computing environments with the best capabilities of HPC and cloud.