So far, the rebuild process is going along well. Later on the 14th, I went through the hardware on the system, thinking perhaps a controller had died, or a power splitter had failed. Neither was the case, as I had forgotten that I don’t have any controllers with only two disks plugged in, and the box isn’t actually using any splitters at all. I went through the whole thing and cleaned it out and checked all of the connections, and swapped a few SATA cables that seemed to be iffy just to be safe.
Fired it back up after that and was immediately greeted with it running fsck on first the boot drive (old 80GB PATA Western Digital), which passed, and then on the RAID0 scratch array (two 200GB Maxtor SATA drives) which also passed. Of course, /dev/md0 wasn’t started properly since it was down two devices, so it couldn’t even be mounted for its overdue fsck. Ubuntu gave me the choice to go on or to go into a root prompt…I decided to go with the root prompt.
I ran the following, which made some progress:
mdadm --assemble --force --verbose /dev/md0 /dev/sdc1 /dev/sdd1 ...
For whatever reason the drive letters assigned change around each time I reboot, so I don’t recall which ones were actually in the command. It then tried to start the array, at which point I was greeted with quite a few console messages about how ata2 and /dev/sdb were not responding properly. After a short time of that, it marked only that drive as failed and the array indicated 5/6 available – degraded, but working and valid data. Ran smartctl -i /dev/sdb to find the model and serial number of the affected drive (the RAID5 has both -JS and -KS series Western Digital 500GB drives) and found the culprit, one of the original -JS drives. Pulled it and applied for an advance RMA, which thankfully Western Digital does not charge you anything for (unlike Seagate).
I ordered a replacement -KS drive from Amazon since it was nearly half the price of the drive at Fry’s, and it was far and away the cheapest way to get it here overnight thanks to Amazon Prime. It came in today, so I went ahead and slotted it into the server and fired it up. All it took was fdisk -l to format the fresh drive with a single fd partition, and then adding it to the array:
mdadm /dev/md0 -A /dev/sdd1
That was probably not quite an hour ago and it’s been rebuilding automatically ever since. Current status from /proc/mdstat:
md0 : active raid5 sdd1[6] sdi1[0] sde1[5] sdc1[4] sdg1[3] sdh1[2] 2441919680 blocks level 5, 64k chunk, algorithm 2 [6/5] [U_UUUU] [=====>...............] recovery = 28.1% (137402240/488383936) finish=184.1min speed=31755K/sec
Tags: computer, hard drive, Linux, lvm, mdadm, seagate, ubuntu, western digital