All started when I wanted to have a true HA configuration for the server nodes. That implies that not only the storage of each server is a RAID10 setup, but also the boot and root resources are in a RAID1 configuration.
I started to use an old and known tutorial to convert from standard installation to a RAID1 based installation of Fedora/CentOS: Spiceworks Tutorial
The general steps are the same but without some very important updated commands and some additional steps I was not able to perform the setup on Fedora 22. Note that the tutorial is from 29 April 2009 and lot of things changed in Fedora in the last 6 years.
I will keep the same steps as the original and I will mention the additions:
STEP 1: Put a new drive into the server. In my case I added a second SSD, a Samsung 830.
[root@nas1 ~]# lsscsi
[0:0:0:0] disk ATA Crucial_CT240M50 MU03 /dev/sda
[1:0:0:0] disk ATA SAMSUNG SSD 830 3B1Q /dev/sdb
Note that the new disk must be the same or bigger that the first disk. I was lucky that the Crucial disk is a 240GB and the Samsung disk is 250GB so plenty of room.
Make sure to add the disk on the same HBA as the first disk. This is very important for the latency of operations of the raid. I added both SSDs on the top enclosure of the HP MicroServer where the optical disk drive should be placed. Then I linked them both to the HBA from the motherboard on the 6Gbps ports (0 and 1). The other disks of the storage from the drive cage are linked to the extra LSI HBA installed on the PCI slot.
STEP2: Print current partition layout for reference:
STEP 3: Partition the second drive
Partition the new drive with the following command:
# fdisk /dev/sdb
a. Press ‘n’ to create a new partition
b. Press ‘p’ for primary partition
c. Press ‘1’ to create the first primary partition (this step might be automatically completed as there are no partitions yet)
d. Press ‘1’ to start it at the first cylinder
e. Type in the number of the last cylinder for the original /dev/sda1 partition
f. Type in ‘t’ to set the partition type
g. Type in the partition number
h. Type in ‘fd’ for Linux RAID type
i. Perform sub-steps a-h for the second primary partition /dev/sdb2, and create it identical to /dev/sda2
j. Type in ‘w’ to write the changes
k. As we have some extra space left create a 3rd primary partition that may be used as swap partition.
STEP 4: Compare the partition layout between the two drives
#fdisk -l
You should have the same metric for the first partitions of both sda and sdb.
STEP 5: Create our new RAID mirror devices
Zero the superblock in case the new drive happened to be part of a linux software RAID before:
# mdadm --zero-superblock /dev/sdb1
# mdadm --zero-superblock /dev/sdb2
Create the new RAID devices:
# mdadm --create /dev/md0 --verbose --level=1 --raid-devices=2 missing /dev/sdb1
# mdadm --create /dev/md1 --verbose --level=1 --raid-devices=2 missing /dev/sdb2
We use the word ‘missing’ in place of the first drive as we will add the drive to the array after we confirm the machine can boot and all the data is on array (which will currently only have 1 drive, the newly added /dev/sdb).
STEP 6: Build the /etc/mdadm.conf
In case you already have an /etc/mdadm.conf move it as a backup and then create the new file as:
# mdadm --examine --scan >> /etc/mdadm.conf
STEP 7: Format the new RAID boot partition
We can now format the RAID partition that will be used in place of our boot partition on /dev/sda1. In our example system the original partition is ext3 mounted as /boot.
Setup using command:
# mkfs.ext4 /dev/md0
STEP 8: Mount and build the new boot partition
We need to copy over the existing boot partition to the new RAID device:
# mkdir /mnt/boot
# mount /dev/md0 /mnt/boot
# cp -dpRx /boot/* /mnt/boot/
# umount /mnt/boot
STEP 9: Mount new boot partition in place of old
We will now unmount the current /boot partition and mount our new /dev/md0 device in place of it to prepare for making a new initrd image.
# umount /boot
# mount /dev/md0 /boot
STEP 10: Build new initrd image
Now build a new initrd image that contains the dm-mirror and other raid modules to be safe (without these the boot will fail not recognizing the new /boot).
Create the new initrd image:
#dracut -v -f
If we want to create it for a specific kernel (in case something happens and we are forced to do this operation from a rescue system, as it was the case for me 🙂 )
#dracut -v -f vmlinuz-4.1.10-200.fc22.x86_64 4.1.10-200.fc22.x86_64
STEP 11: Install grub on both drives
Very important to add the following parameters to the kernel:
rd.auto rd.md.waitclean=1
where:
rd.auto rd.auto=1
enable autoassembly of special devices like cryptoLUKS, dmraid,
mdraid or lvm. Default is off as of dracut version >= 024.
rd.md.waitclean=1
wait for any resync, recovery, or reshape activity to finish
before continuing
see solution at: https://ask.fedoraproject.org/en/question/71978/dracut-warning-could-not-boot/
Without the above parameters from some weird reason system tries to boot before the software raid of the boot partition is ready.
First step is to modify the /etc/default/grub configuration that will be used for the new grub menu.
[root@nas1 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_PRELOAD_MODULES="raid mdraid1x lvm2"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora_localhost/root $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) quiet video=hyperv_fb:1024x768 zswap.enabled=1 zswap.zpool=zsmalloc console=ttyS1,115200n8 rd.auto rd.md.waitclean=1"
GRUB_DISABLE_RECOVERY="true"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_CMDLINE_LINUX_DEFAULT="video=1024x768"
Note the GRUB_PRELOAD_MODULES line that pre-loads also some needed modules without which we are not able to boot successfully. Very important to add them because they are not loaded by default.
Create the new grub menu:
#grub2-mkconfig -o /boot/grub2/grub.cfg
Install grub on both physical drives MBR and edit /etc/fstab to reflect our new boot partition using the UUID tag
# grub-install --recheck /dev/sda
# grub-install --recheck /dev/sdb
STEP 12: Edit /etc/fstab to reflect new /boot location
Determine the UUID tag of our new boot device:
[root@nas1 ~]# blkid /dev/md0
/dev/md0: UUID="acdb559b-6a48-40b5-9221-76e192176c0d" TYPE="ext4"
Add the UUID from above to identify the boot partition. Very important not to use symbolic names as /dev/md0 because some systems tend to use different names between boots. I ended up initially when using /dev/md0 with an un-bootable system because my boot raid partition was getting mounted as /dev/md126 and grub was so unable to find the boot partition.
#
# /etc/fstab
# Created by anaconda on Tue Nov 4 13:32:16 2014
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/fedora_localhost-root / ext4 defaults 1 1
UUID=acdb559b-6a48-40b5-9221-76e192176c0d /boot ext4 defaults 1 2
/dev/mapper/fedora_localhost-home /home ext4 defaults 1 2
/dev/lvm-5T/lvm0 /media/storage ext4 defaults 1 2
STEP 13: Boot with the new boot partition
Reboot the system and check that we have a system with a RAID1 boot partition.
# reboot
Step 14: Disaster recovery
In case something goes wrong:
– you forgot to update fstab with the UUID or you used the symbolic notation /dev/md0 instead of the UUID
– you forgot to add the extra modules and parameters in /etc/default/grub
– etc.
the following procedure can be followed to get back on track.
1. Boot using a rescue system, a live CD etc. In my case I have an rescue Live Fedora on a boot-able memory card in the HP MIcroServer.
2. Identify which is our md boot partition and also find out the root partition (/dev/sda2)
cat /proc/mdstat
We get in our case that the boot partition was identified by the Live Rescue as /dev/md127 device.
Mount the root partition under /mnt and bind the special directories and the boot partition
#mount /dev/sda2 /mnt
#mount --bind /dev /mnt/dev
#mount --bind /sys /mnt/sys
#mount --bind /proc /mnt/proc
#mount /dev/md127 /mnt/boot
Do chroot to start using the mounted directory as the root partition:
#chroot /mnt
Fix what you have to fix and then recreate the image and the grub menu
Create the new image for the kernel installed on server. A simple dracut -v -f is not enough because you will end up with a kernel image of the Live Rescue kernel.
#dracut -v -f vmlinuz-4.1.10-200.fc22.x86_64 4.1.10-200.fc22.x86_64
Create the new grub menu:
#grub2-mkconfig -o /boot/grub2/grub.cfg
Install grub on both physical drives MBR and edit /etc/fstab to reflect our new boot partition using the UUID tag
# grub-install --recheck /dev/sda
# grub-install --recheck /dev/sdb
Reboot and hope for the best.
STEP 15: Change first drive /dev/sda1 to Linux RAID type
Modify the old /boot device (/dev/sda1) to be Linux RAID type. This will prepare it so it can be added to our RAID device /dev/md0 (which our new /boot is using).
# fdisk /dev/sda
a. Type in ‘t’ to set the partition type
b. Type in the partition number (1)
c. Type in ‘fd’ for Linux RAID type
d. Type in ‘w’ to write the changes and exit fdisk
STEP 16: Add /dev/sda1 to /dev/md0 mirror
Add the /dev/sda1 partition to the /dev/md0 mirror and remove the old label
# e2label /dev/sda1 ""
# mdadm --add /dev/md0 /dev/sda1
STEP 17: Wait for the /dev/md0 mirror to rebuild
Watch the /proc/mdstat output to make sure the syncing of the newly completed /dev/md0 device finishes before continuing.
# watch cat /proc/mdstat
Once it is complete you may continue. Since the /boot partition is usually only a couple hundred megabytes this usually completes very quickly.
STEP 18: Create a new LVM physical volume
Create a new LVM PV on the second raid device. We will use this new PV to extend the current LVM volume group to.
# pvcreate -v /dev/md1
STEP 19: Extend volume group to new PV
Use command `vgs` to show the name of the LVM VG. In our example it is fedora_localhost.
Run the following command replacing fedora_localhost with the actual name of your volume group. You may have to do this step for each VG if you have multiple vgs on the /dev/sda2 disk:
# vgextend -v fedora_localhost /dev/md1
STEP 20: Move logical volume to new PV
Use `pvdisplay` to verify the VG is now using both drives:
# pvdisplay
Now we can move the LV to the new PV:
Use`lvs` to get the LV names (you will need to move all) then run the following command for both (replacing LogVolXX with your LV names):
# pvmove -v /dev/sda2 /dev/md1
Wait a while…. It will output % done every 15 seconds.
Depending on the size of your original drive this may take an hour or more. It is now copying everything from the first drive to the second drive while keeping the machine online.
It may happen that you cannot move the lvs due to lack of space. Somehow even if the pv is the same size it happened to me that I got that complain. My solution was to delete the swap lv from the volume group. Another solution is to try do reduce one of the current lvs.
STEP 21: Remove the VG from the old PV
Once all the data is finished moving we want to ‘reduce’ the volume groups to only use the new physical volume and then remove the old physical volume so it can no longer be used for LVM.
# vgreduce -v fedora_localhost /dev/sda2
# pvremove /dev/sda2
STEP 22: Reboot
Reboot the system and check if we are seeing a new root mounted on the new /dev/md1 device
STEP 23: Change first drive /dev/sda2 to Linux RAID type
Change the old LVM partiton to Linux RAID type:
# fdisk /dev/sda
a. t
b. 2
c. fd
d. w
Verify the partition type is now Linux RAID:
# fdisk -l
STEP 24: Add /dev/sda2 to /dev/md1 mirror
We are now ready to add the old /dev/sda2 LVM partition to the /dev/md1 RAID device:
# mdadm --add /dev/md1 /dev/sda2
STEP 25: Wait for the /dev/md1 mirror to rebuild
Watch the /proc/mdstat output to make sure the syncing of the newly completed /dev/md1 device finishes before rebooting.
# watch cat /proc/mdstat
STEP 26: Final reboot
Do a final reboot to ensure that all is OK.
In case you end up with an un-bootable system you can always use step 14 to recover.