Tuesday, March 3, 2015

How to replace a disk in RAID6

On one of the server, a disk had some problems so it was time to change it.
Machine is Ubuntu 10.04 with 4 disks in RAID 6 and using LVM.

First, let's "remove" the failing disk (sdb2) from the RAID:
$ mdadm --manage /dev/md0 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0

$ mdadm --manage /dev/md0 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md0

Now if you run lsblk you can see that the disk doesn't belong to the RAID.

-> shutdown the machine
-> replace the disk
-> start the machine

When the machine started, there was a problem with grub and grub rescue command prompt showed up.

To avoid to mess with grub, I decided to use Ubuntu Desktop live CD. I download it on a USB stick and boot the machine on it.

I had the following issue:
missing parameter in configuration file. keyword path
To fix it, simply type live and press enter.
Note: if you hit tab, you can see a list of options.

Perfect, it is working. Now let's install some packages:
$ sudo apt-get install mdadm lvm2

Let's create the partition of the new disk (didn't work for me):
$ parted -a optimal /dev/sdb
(parted) mklabel gpt
(parted) mkpart primary 1 2
(parted) set 1 bios_grub on
(parted) mkpart primary 2 -1
(parted) set 2 raid on
(parted) print

With this method, there was a small unallocated partition at the end... So I couldn't add the disk to the RAID because the size of sdb2 was too small.

Instead I used dd.
First you need to calculate the count parameter.
count = (128*N)+1024
Where N is the number of partitions you have. In this case I had 2, so the result is 1280.

The following commands will copy the partition table from /dev/sda to /dev/sdb. Make sure you type the second command correctly!

$ dd if=/dev/sda of=GPT_TABLE bs=1 count=1280
$ dd if=GPT_TABLE of=/dev/sdb bs=1 count=1280

Then I had to reboot the machine.

Finally add /dev/sdb2 to the RAID:
$ mdadm --manage /dev/md0 --add /dev/sdb2
mdadm: added /dev/sdb2

Then it will take time to repair everything. You can see the process using this command:
$ cat /proc/mdstat

My original disks are Seagate and I tried to add a Western Digital. Unfortunately the size was different by 1M, so I couldn't use the WD disk. I ordered a similar Seagate disk, and the size was OK.
The disk couldn't be added because the size was smaller... and by not much. For example(fdisk -l):
Seagate Barracuda /dev/sda 2,000,396,746,752 bytes
Western Digital   /dev/sdb 2,000,395,698,176 bytes

The best solution is : never us the full disk! Leave a 10 or 15M at the end. So disks with small differences in size can be added.