youtube image
From YouTube: Breakout Session: Managing 250+ ES & Kafka Clusters using Operator Frameworks - Galuh & Daniel

Description

Gojek, a decacorn with 100 million users in Southeast Asia, used to have developer teams that managed their own Elasticsearch logging clusters. However, they have different knowledge and time for it, resulting in different QoS of each cluster. To solve this issue, the infrastructure team started to manage those clusters by provisioning Elasticsearch and Kafka clusters on LXC containers on behalf of the developer teams.

The initial architecture worked fine for 30+ clusters, but significant growth demanded the team to manage 250+ clusters. Elasticsearch and Kafka require special care to administer, e.g. Elasticsearch cluster status should be green before turning off another Elasticsearch node. This complexity slows down the team to do maintenance operations, e.g. the team needs weeks to upgrade these clusters. This talk presents how Operator Frameworks reduces our daily toil.