RAID Tip 9 of 10 - Monitor the RAID performance

The ability of the RAID to handle failures of its hard drives relies on two things:

  1. Built-in redundancy of the storage. The RAID has some extra space in it, and the end-user capacity of a fault-tolerant array is always less than the combined capacity of its member disks.
  2. Diligence of people to provide additional redundant storage should the built-in reserve fail.

Once redundancy is lost because of the first drive failure, the human intervention is needed to correct the problem and restore the required level of redundancy. Redundancy is to be restored quickly, or otherwise there is no point in having the RAID at all.

However, you do not know when to act if you do not monitor the array and the disks often enough.

  • Regularly check the SMART status on the drives using appropriate software. With a software RAID, use SpeedFan or HDDLife. With a hardware RAID, use the vendor-supplied monitoring software.
  • The so-called "scrubbing" should be used whenever possible. The scrubbing process reads all the data on the array during idle periods or per the predefined schedule. This allows to discover the newly developed bad sectors on the drives before encountering them in actual use. The data can then be relocated away from the unreliable spots or the disk can be replaced.
  • Any unexplained drop in the throughput may indicate a problem with one of the hard drives. With certain systems (like a QNAP NAS based on Linux MD RAID) the imminent disk failure may manifest itself as the unit "stalling" long before the drive is declared dead (see QNAP story).

Copyright © 2011 - 2023