Just rebuilt onto Ceph and it’s a game changer. Drive fails? Who cares, replace it with a bigger drive and go about your day. If total drive count is large enough, and depends if using EC or replication, it could mean pulling data from tons of drives instead of a handful.
It’s still the same issue, RAID or Ceph. If a physical drive can only write 100 MB/s, a 36TB drive will take 360,000 seconds (6000 minutes or 100 hours) to write. During the 100-hour window, you’ll be down a drive, and be vulnerable to a second failure. Both RAID and Ceph can be configured for more redundancy at the cost of less storage capacity, but even Ceph fails (down to read only mode, or data loss) if too many physical drives fail.
While true, it can fill the drive replacement with data spread from way more number of drives than raid can, so the point I was trying to make is that a second failure due to resilvering cam be greatly mitigated by using a Ceph setup.
Just rebuilt onto Ceph and it’s a game changer. Drive fails? Who cares, replace it with a bigger drive and go about your day. If total drive count is large enough, and depends if using EC or replication, it could mean pulling data from tons of drives instead of a handful.
It’s still the same issue, RAID or Ceph. If a physical drive can only write 100 MB/s, a 36TB drive will take 360,000 seconds (6000 minutes or 100 hours) to write. During the 100-hour window, you’ll be down a drive, and be vulnerable to a second failure. Both RAID and Ceph can be configured for more redundancy at the cost of less storage capacity, but even Ceph fails (down to read only mode, or data loss) if too many physical drives fail.
While true, it can fill the drive replacement with data spread from way more number of drives than raid can, so the point I was trying to make is that a second failure due to resilvering cam be greatly mitigated by using a Ceph setup.