Why RAID 5 still works, usually
HOW DOES RAID 5 REDUCE DATA LOSS? RAID 5 takes your data and adds some parity data that makes it possible to reconstruct the original data if there is a drive failure (RAID 6 is similar, except it can reconstruct after two failures). So why would it stop working?
THE URE PROBLEM
RAID 5 works fine when there are no further failures or errors during data reconstruction. Back in 2007 though, almost all SATA drives, and many SCSI drives, were spec’d with one Unrecoverable Read Error (URE) at 10^14. That’s one URE every 12.5TB.
One terabyte drives were coming into production then. If you had an 8 drive RAID 5 stripe, and one drive failed, the RAID controller would have to read 7TB of data to reconstruct the failed drive.
That meant a better than 50 percent chance that during the reconstruction a URE would scuttle the entire process. When that happens it would have been faster to use a backup to rebuild the data.
Of course, drives have only gotten bigger. Four terabyte drives are common and we now have 10TB drives.
Why do we still rebuild RAID drives? | Storage looks inward: Today’s action is inside the server, not out on the SAN | Disk drive reliability: What we’ve learned from a billion hours | How to really erase any drive — even SSDs
WHY DOES RAID 5 STILL WORK?
Simple: drive vendors up’d the spec – for some drives – to one URE in 10^15 bits, or about 125TB. Of course, now that drive capacities have also increased by 10x, the problem of failure due to a URE during reconstruction is coming back.