More Raid tidbits – Monitoring all raid events and changing default email template
A geek really knows the importance of his or her data and backups that just avoids pulling the hair off! When one of my hard drives on a server just died after having a well served 6000+ hours of life span, I found myself really lucky as other array component of RAID1 came to the rescue. Reason was a perhaps a short circuit which could have cost me the biggest loss of my data ever, I had in my life, so a blazing smile was well deserved. Electric power is one of the infinite things that doesn’t work here like it always (oh, its a long story – I should tell some of it sometime later)!
I got an email from mdmonitor telling me about DegradedArray event. So, when I was rebuilding the array, I noticed I got no alerts about rebuild process or array status updates which I really wanted to investigate. Till that time, I wasn’t event knowing that ‘mdadm –monitor’ only sends you the critical updates. So, I pulled up man pages and saw these are critical events:
- DeviceDisappeared
- Fail
- FailSpare
- DegradedArray
Rest of the events are not reported at all! Also, that RHEL5’s mdadm package has pre-compiled template of email that mdadm sends upon occurrence of a critical event which I wanted to change from as well cause it looks pretty immature:
This is an automatically generated mail message from mdadm running on HOSTNAME A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: bla bla bla
Seriously, it says “faithfully”… wth? Lol. We know that all machines are faithful to a human unless they’re not broken or gay!
It definitely needed to be changed. Checking /etc/init.d/mdmonitor at least gave an idea that its not something changeable but it uses default template when MAILADDR is specified while it doesn’t when PROGRAM parameter is used in /etc/mdadm.conf by passing on RAID array as arguments to the script which is used, instead.
I did this then.
# mdadm --detail --scan >> /etc/mdadm.conf # echo "PROGRAM /etc/raidalerter" >> /etc/mdadm.conf # sed -e '1i\DEVICE partitions' -i /etc/mdadm.conf # cat /etc/raidalerter (create this file with below script) #!/bin/bash echo -e "Likely an unfavourable or a bad thing just happened to your RAID. Even if its recovering, it was a bad thing which caused this! \n\n\n" $(cat -A /proc/mdstat | sed 's/\$/\\n/g') | mail -s "$1 on $2 $3 at $HOSTNAME" some-mail-address@example.com # chmod +x /etc/raidalerter # service mdmonitor restart
Provided that you’ve an MTA working fine, mails would be delivered upon any of RAID incidents to the maximum verbosity possible. I don’t think that any of the hardware raids does so?!
I then tested it on a small array to make sure that alerts are deliverable.
# mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1 mdadm: set /dev/sdb1 faulty in /dev/md0 mdadm: hot removed /dev/sdb1 # mdadm /dev/md0 -a /dev/sdb1 mdadm: re-added /dev/sdb1
Preview:
Subject: RebuildFinished on /dev/md0 at ToughGuy
Likely an unfavorable or a bad thing just happened to your RAID. Even if its recovering, it was a bad thing which caused this! Personalities :
[raid1]
md1 : active
raid1 sdb3[1] sda3[0]
724555520 blocks [2/2] [UU]
md0 : active
raid1 sdb1[1] sda1[0]
4008064 blocks [2/2] [UU]
unused devices: <none>


Recent Comments