youtube image
From YouTube: CRUSH-ing the OSD Variance Problem - Tom Byrne, Storage Sysadmin

Description

CRUSH-ing the OSD Variance Problem - Tom Byrne, Storage Sysadmin

Tom will be talking about the challenges of keeping OSD utilization variance under control in a large, rapidly growing cluster.

In the past year he has been managing a large and heavily utilized Ceph cluster that has grown from 1500 OSDs to 5000 OSDs (40PB), while maintaining an average OSD utilization of over 50% throughout the year. This has presented some unique challenges, and Tom will discuss these, along with the positive impact the upmap balancer has had on this process, and general advice for growing near full clusters.

About Tom Byrne
Science and Technology Facilities Council
Storage Sysadmin
Oxford, United Kingdom
LinkedIn Connect
The Science and Technology Facilities Council is a world-leading multi-disciplinary science organisation. We provide access to large-scale facilities across a range of physical and life sciences, enabling research and innovation in these areas. We do world-leading research, and need world-leading computing to facilitate it.

As a storage systems administrator for STFC, I've been working with Ceph for a number of years, and have been part of a team that manages several Ceph clusters for various use cases. The largest cluster currently has five thousand disks and 40PB of raw capacity, and is used to store data for the LHC experiment at CERN.

Running a cluster at this scale has presented a unique set of challenges, and has allowed me to develop an understanding of how to make best use of Ceph’s features to maximise efficiency, data security and performance. Since the last Cephalocon we have tripled the size of our large and very full cluster, going from 1.5k to 4.7k OSDs while continuing to fill the cluster to capacity, which has led to a number of insights into managing the growth of a large, full Ceph cluster.

I've talked previously about our large Ceph cluster at the Cephalocon APAC 2018 (Talk title: Erasure Code at Scale), and have talked about Ceph at various conferences in the field of computing for high energy physics.