So i tried to recover RAID1 and GRUB on EFI enabled Linux (AlmaLinux) system, and I ran into few unexpected issues. Even chatGPT and google couldn’t answer this, and I was surprised i couldn’t find any answers on google since I am sure many of us use RAID1 on Linux on EFI enabled systems.
I have 2 nvme 4TB samsung SSDs. They are mapped as /dev/nvme0n1 and /dev/nvme1n1 on the systems and are in RAID1 configuration. I am using Linux software “md” raid. So when the first SSD died i was left with working system on second SSD, which usually works without issues you are just left without RAID redundancy protection. I would reboot to confirm it all still works. I booted from EFI enabled BIOS and it worked, the entries for UEFI OS were already being seen in BIOS. I waited a few days for replacement SSD to arrive and then proceeded to install new SSD and went with my standard RAID1 recovery procedure I previously used on all SATA SSD / NVME SSDs and HDD non efi systems
first i copy existing partition table from working drive ( /dev/nvme1n1)
sgdisk -R /dev/nvme0n1 /dev/nvme1n1
Where nvme1n1 is the source and nvme0n1 is the destination.
After this I re-add those partition to my raid for example /dev/md124 which is my /boot/efi partition
mdadm /dev/md124 -a /dev/nvme0n1p5
After raid rebuild (resync) i gtt
mdadm --detail /dev/md124
/dev/md124:
Version : 1.0
Creation Time : Sat May 18 21:12:23 2024
Raid Level : raid1
Array Size : 205760 (200.94 MiB 210.70 MB)
Used Dev Size : 205760 (200.94 MiB 210.70 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Dec 14 23:21:25 2024
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : srv-new:boot_efi (local to host srv-new)
UUID : 0cf06dac:1ab929b9:76e17bb3:63233042
Events : 88Number Major Minor RaidDevice State
2 259 5 0 active sync /dev/nvme0n1p4
1 259 10 1 active sync /dev/nvme1n1p4
So i went on to install EFI GRUB entry via efibootmgrefibootmgr --create --disk /dev/nvme0n1 --part 4 --label "UEFI OS" --loader '\EFI\BOOT\BOOTX64.EFI'
but whatever i did i always got only ONE entryefibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0004
Boot0000* UEFI OS HD(4,GPT,bac8b516-1107-4558-a567-a60ad674c7f8,0x14231000,0x64800)/File(\EFI\BOOT\BOOTX64.EFI)
After a lot of googling and chatGPT having no clue .. I confirmed in BIOS that the system would only see partition on DISK2 and wouldn’t boot of DISK1
Then i realized that my PARTUUID was copied from the destination disk to /dev/nvme0n1 and it supposed to be UNIQUE
lsblk -o NAME,PARTUUID
NAME PARTUUID
nvme0n1
├─nvme0n1p1 00056612-3823-4c89-8333-16b800929e95
│ └─md127
├─nvme0n1p2 e485e4b7-5110-4928-a285-205779b9f2df
│ └─md125
├─nvme0n1p3 aa9817cd-6421-4ea6-8ecb-6d422eb71515
│ └─md126
├─nvme0n1p4 e6b33e11-a0e9-4681-8ec8-653e30a65ffa
│ └─md124
└─nvme0n1p5 74fc09a2-6672-4c28-8866-f2a5e1ce3104
└─md123
nvme1n1
├─nvme1n1p1 00056612-3823-4c89-8333-16b800929e95
│ └─md127
├─nvme1n1p2 e485e4b7-5110-4928-a285-205779b9f2df
│ └─md125
├─nvme1n1p3 aa9817cd-6421-4ea6-8ecb-6d422eb71515
│ └─md126
├─nvme1n1p4 e6b33e11-a0e9-4681-8ec8-653e30a65ffa
│ └─md124
└─nvme1n1p5 74fc09a2-6672-4c28-8866-f2a5e1ce3104
└─md123
So i need to create new GUID/PARTUUID for separate partitions on /dev/nvme0n1
for that i used gdisk
gdisk /dev/nvme0n1
go to ‘x’ – expert mode option and then clock ‘c’ and ‘R’ – to randomize new GUID for that partiton.
After that repeat for every partition on /dev/nvme0n1
Type ‘w’ to write changes to disk
After existing gdisk, issue the command
partprobe /dev/nvme0n1
To re-read the partition table
After that i confirmed with
lsblk -o NAME,PARTUUID
that PARTUUID are now unique on both disks
NAME PARTUUID
nvme0n1
├─nvme0n1p1 f391024d-b6f8-4036-a004-2ed55fd0d7f7
│ └─md127
├─nvme0n1p2 a97744a6-aeb5-4d12-9c50-6da437e10dce
│ └─md125
├─nvme0n1p3 6aba1d11-17ce-4860-8d8f-f4a6f4b97c89
│ └─md126
├─nvme0n1p4 bac8b516-1107-4558-a567-a60ad674c7f8
│ └─md124
└─nvme0n1p5 fd4e76aa-968c-448d-bc58-8371aef44a5f
└─md123
nvme1n1
├─nvme1n1p1 00056612-3823-4c89-8333-16b800929e95
│ └─md127
├─nvme1n1p2 e485e4b7-5110-4928-a285-205779b9f2df
│ └─md125
├─nvme1n1p3 aa9817cd-6421-4ea6-8ecb-6d422eb71515
│ └─md126
├─nvme1n1p4 e6b33e11-a0e9-4681-8ec8-653e30a65ffa
│ └─md124
└─nvme1n1p5 74fc09a2-6672-4c28-8866-f2a5e1ce3104
└─md123
Now after i installed GRUB EFI entry as
efibootmgr --create --disk /dev/nvme0n1 --part 4 --label "UEFI OS" --loader '\EFI\BOOT\BOOTX64.EFI'
i finally got two entries, one for GRUB loader on /dev/nvme0n1 SSD (first disk) and one for /dev/nvme1n1 on second disk
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0004
Boot0000* UEFI OS HD(4,GPT,bac8b516-1107-4558-a567-a60ad674c7f8,0x14231000,0x64800)/File(\EFI\BOOT\BOOTX64.EFI)
Boot0004* UEFI OS HD(4,GPT,e6b33e11-a0e9-4681-8ec8-653e30a65ffa,0x14231000,0x64800)/File(\EFI\BOOT\BOOTX64.EFI)..BO
I verified that BIOS now see both entries and tried booting from both disks and it works now
So keep in mind that you cannot simply copy partition table from existing working RAID drive when you are recreating the partition table. You either create all partitions manually to match working existing RAID drive, and in this case PARTUUID/GUID will already be unique, or use sgdisk and partition table copy method, but then dont forget to recreate unique GUID / PARTUUID for that partition to be unique so that efibootmgr could see two separate disks and create entries for separate disks which is very important on RAID1 systems.
Hope this helps someone
Leave a Reply