rebuild | How to , and other stuff about linux, photo, php ...

August 14, 2015

Reinstall grub after raid crash

Filed under: Linux — Tags: grub, Linux, rebuild, startup — admin @ 5:51 pm

Well today a client with hetzner server have a problem. Both hard from raid have failed. One is lost forever and one is with bad.
After I have run recover hard for rebuild in raid the server still didn’t boot up. My /dev/sda was faulty and replaced but server was stcuk and didn’t start .
I have to rebuild grub , the hetzner wiki have a tutorial but is not exactly .
So .. in order to start the server reboot in rescue system.
Then my partition table was something like this
/dev/md1 ( /boot ) /dev/md3 (/var) and /dev/md4 (/)

I run this in this order
mount /dev/md4 /mnt mount /dev/md1 /mnt/boot mount /dev/md3 /mnt/var
After this

chroot-prepare /mnt chroot /mnt grub-install /dev/sdb grub-install /dev/sda

And the server was up and running .
From here you should be able to copy the partition table to new hard drive and start rebuilding the raid .

Another tutorial say something like this

# mount /dev/md1 /mnt
# mount -t none -o bind /dev /mnt/dev
# mount -t proc -o bind /proc /mnt/proc
# mount -t sysfs -o bind /sys /mnt/sys
# chroot /mnt

# grub

Look for the file stage1 to find the boot partitions

grub> find /grub/stage1
(hd0,1)
(hd1,1)

Install the bootloader on both partitions. Both are regarded as hd0 from the point of view of the bootloader at boot time.

grub> device (hd0) /dev/sda
device (hd0) /dev/sda
grub> root (hd0,1)
root (hd0,1)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
Checking if “/boot/grub/stage1” exists… yes
Checking if “/boot/grub/stage2” exists… yes
Checking if “/boot/grub/e2fs_stage1_5” exists… yes
Running “embed /boot/grub/e2fs_stage1_5 (hd0)”… 17 sectors are embedded.
succeeded
Running “install /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,1)…
succeeded
Done.

The same for the other disk

grub> device (hd0) /dev/sdb
grub> root (hd1,1)
grub> setup (hd0)
grub> quit

On another website found some differences

mount /dev/md2 /mnt
mount /dev/md1 /mnt/boot
mount -t dev -o bind /dev /mnt/dev
mount -t proc -o bind /proc /mnt/proc
mount -t sys -o bind /sys /mnt/sys
chroot /mnt
For debian 
apt-get install --reinstall grub-pc 
dpkg-reconfigure grub-pc

Comments (0)

November 20, 2014

How to get md RAID array to rebuild even if read errors

Filed under: Linux — Tags: fail, Linux, raid1, rebuild, spare — admin @ 3:14 pm

Well hope this will help someone like me because I really didn’t find exactly all information on internet.
So .. My situation was like this raid 1 software . On smartctl both hard drive look pretty mess up so I decide to switch sdb first. So, I remove the sdb from raid, make the tiket to datacenter and they say won’t change the hard drive because is ok. So I put the sdb back in place but my surprise was big. The raid didn’t build and get a failed . And the sdb was put in (S) spare .

cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb4[2] sda4[0]
1851802943 blocks super 1.2 [2/2] [UU]

md2 : active raid1 sdb3[2](S) sda3[0]
1073740664 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sdb2[2] sda2[0]
524276 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdb1[2] sda1[0]
4193268 blocks super 1.2 [2/2] [UU]

unused devices:
When you have 1Tb is a lot of waiting . So panic was on me, why the sbd3 is in S state and is not get in place again. When I was looking on the dmesg log surprise. Because some cluster can’t be read from sda, the raid failed :(. Well this situation is not so ok.. because if both hard drive will have only one cluster bad you are in situation that you can’t replace the hard drive.

So what is the solution. Well there is no magic one like force or skip , or perhaps exist but I didn’t find the exact solution.
The dmesg look something like this
[7646674.321121] end_request: I/O error, dev sda, sector 1251467520
[7646674.321148] raid1: sda3: rescheduling sector 1242024160
[7646691.820040] end_request: I/O error, dev sda, sector 1251467520
[7646691.841995] raid1:md2: read error corrected (8 sectors at 1242026240 on sda3)
[7646709.327315] end_request: I/O error, dev sda, sector 1251467528
[7646709.327923] raid1:md2: read error corrected (8 sectors at 1242026248 on sda3)
[7646726.817936] end_request: I/O error, dev sda, sector 1251467536
[7646726.818461] raid1:md2: read error corrected (8 sectors at 1242026256 on sda3)
[7646744.291904] end_request: I/O error, dev sda, sector 1251467544
[7646744.292431] raid1:md2: read error corrected (8 sectors at 1242026264 on sda3)
[7646762.473787] end_request: I/O error, dev sda, sector 1251467560
[7646762.474294] raid1:md2: read error corrected (8 sectors at 1242026280 on sda3)
[8337479.557401] end_request: I/O error, dev sda, sector 1251471424
[8337479.557427] raid1: sda3: rescheduling sector 1242028000
[8337506.426294] end_request: I/O error, dev sda, sector 1251471432
[8337506.480066] raid1:md2: read error corrected (8 sectors at 1242030152 on sda3)
[8337507.764690] raid1: sda3: redirecting sector 1242028000 to another mirror
[8769133.060807] end_request: I/O error, dev sda, sector 1251476840
[8769133.060833] raid1: sda3: rescheduling sector 1242033496
[8769150.443149] end_request: I/O error, dev sda, sector 1251476840
[8769150.443726] raid1:md2: read error corrected (8 sectors at 1242035560 on sda3)
[8769150.443772] raid1: sda3: redirecting sector 1242033496 to another mirror

Well this is after a grep with sectors in search. The modern hard drive can relocate some sector if they are bad, but to do that a write on that sector must occur . So I start using hdparam to check and write that sectors.
First of all I was making a grep on dmesg do seee the sectors
dmesg | grep -i sectors , or something like this to find out what are the sector.
For me something like this was the output
[9328180.676069] end_request: I/O error, dev sda, sector 1251477144
[9328249.705549] end_request: I/O error, dev sda, sector 1251477144
[9338442.525476] end_request: I/O error, dev sda, sector 1251477144
[9338482.362380] end_request: I/O error, dev sda, sector 1251477144
[9328232.631537] end_request: I/O error, dev sda, sector 1251471488
[9338452.461797] end_request: I/O error, dev sda, sector 1251471488
[9338499.453213] end_request: I/O error, dev sda, sector 1251471488
[9328169.224894] end_request: I/O error, dev sda, sector 1251471456
[9328215.499044] end_request: I/O error, dev sda, sector 1251471456
[9338465.239646] end_request: I/O error, dev sda, sector 1251471456
[9338520.575210] end_request: I/O error, dev sda, sector 1251471456
[9338456.717853] end_request: I/O error, dev sda, sector 1251470641
[9338503.592660] end_request: I/O error, dev sda, sector 1251470641
[9328154.873250] end_request: I/O error, dev sda, sector 1251470640
[9328198.474859] end_request: I/O error, dev sda, sector 1251470640

Well not in this order but .. something like this ( here is after I sorted bout sector number )
So in order to check the sector you can do this

hdparm â€“read-sector 1251470640 /dev/sda
/dev/sdb: Input/Output error

PLEASE NOTE THAT WRITING THOSE SECTOR IS ON YOUR RISK AND IRREVOCABLE DATA LOSS MAY OCCURRED.
I STRONGLY SUGGEST TO HAVE BACKUP IN CASE YOU CAN’T DO ANYTHING AND A RE INSTALL OF OS IS NEEDED

In order to make a write you have to run this
hdparm â€“write-sector 1251470640 â€“yes-i-know-what-i-am-doing /dev/sda

After this I was run a read again and I see no error. So I was thinking ok I mark them all and run again a rebuild . Also this can be run if
smartctl -a /dev/sdb | grep -i reallocated
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always â€“ 0
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always â€“ 0

If you have 0 then you can write that sector because hard drive will try to relocate that sector.
So after I relocated all that sectors, I rebuild again, waited 2 hour and another error appear . So I took again a look on dmesg and see other sector that was next to that I first have.
So I started to think it there are no more . So I try to read more to see if error appear.
So I run something like this
for i in $(seq 1251470640 1251478000) ; do hdparm â€“read-sector $i/dev/sda |grep error ; done
If you see any line with error you have to search those and mark them. You can modify this script into more complex one and if error occur to make it write that sector .

A small script

#!/bin/bash
for i in $(seq 3645583000 3646853936)
do
a=$(hdparm –read-sector $i /dev/sdb)

if [ “$?” -ne “0” ]; then
echo $i
hdparm –write-sector $i –yes-i-know-what-i-am-doing /dev/sdb
fi

done

After I have marked them all I have manage to rebuild the raid again.