youtube image
From YouTube: Kueue: A Kubernetes-native Job Queueing - Abdullah Gharaibeh, Google

Description

Kueue: A Kubernetes-native Job Queueing - Abdullah Gharaibeh, Google

Most Kubernetes core components are pod centric, including the scheduler and cluster autoscaler. This works well for service workloads where the pods of a service are mostly independent and all services are expected to be running at all times. However, for batch workloads, it does not make sense to focus only on pods, as the partial execution of pods from multiple parallel batch jobs may lead to deadlocks where many jobs may be simultaneously active while none is able to make sufficient progress to completion or start at all. Even for single-pod batch jobs, whether on-prem or in the cloud with autoscaling capabilities, the reality is that clusters have finite capacity: constraints on resource usage exist for quota and cost management (especially true for GPUs) and so users will want an easy way to fairly and efficiently share the resources. Kueue addresses the above limitations, offering queueing capabilities commonly exist in legacy batch schedulers in the most k8s native way. It is a k8s subproject currently under development at https://github.com/kubernetes-sigs/kueue.