youtube image
From YouTube: Revealing BlueStore Corruption Bugs in Containerized Ceph Clusters

Description

Presented by: Satoru Takeuchi

Revealing BlueStore Corruption Bugs in Containerized Ceph Clusters

Cybozu has been running and testing their Rook/Ceph clusters for two years. During this time, they have suffered from a bunch of BlueStore corruption (e.g. #51034 and #53184). Most corruptions happened just after OSD creation or on restarting OSDs. They have been able to detect these problems because the nodes in their clusters are restarted frequently and lots of OSD creation happens for each integration test. These scenarios are not so popular in traditional Ceph clusters but are common in containerized Ceph clusters. They will share what the known problems are in detail and how they have overcome these problems with the Ceph community. In addition, they will also propose improvements to the QA process to prevent similar problems in the future.