youtube image
From YouTube: Managing Multi-Cloud Apache Spark on Kubernetes - Ilan Filonenko, Aki Sukegawa, Bloomberg

Description

Managing Multi-Cloud Apache Spark on Kubernetes - Ilan Filonenko, Aki Sukegawa, Bloomberg

Bloomberg has built multi-cloud quant platforms on top of Kubernetes to enable its users to develop sophisticated financial applications with integrated first-class data science capabilities. In this journey, it quickly became clear that managing data science infrastructure in a multi-cloud environment is challenging, especially when it comes to Apache Spark. While Kubernetes provides an excellent abstraction for designing composable infrastructure substrates, it comes with a list of challenges when dealing with auto-scaling, scheduling, preemption, and security. Given these challenges, this talk will explore how one can effectively manage an expansive Spark infrastructure solution that spans bare-metal and multiple public cloud platforms. We will also walk through various observability strategies, primarily focusing on how to surface cluster information to a varied group of Spark end-users by leveraging a variety of native Kubernetes resources, like node autoscalers, controllers, and custom PodConditions.