Mar 9 2010

More Raid tidbits – Monitoring all raid events and changing default email template

A geek really knows the importance of his or her data and backups that just avoids pulling the hair off! When one of my hard drives on a server just died after having a well served 6000+ hours of life span, I found myself really lucky as other array component of RAID1 came to the rescue. Reason was a perhaps a short circuit which could have cost me the biggest loss of my data ever, I had in my life, so a blazing smile was well deserved. Electric power is one of the infinite things that doesn’t work here like it always (oh, its a long story – I should tell some of it sometime later)!

I got an email from mdmonitor telling me about DegradedArray event. So, when I was rebuilding the array, I noticed I got no alerts about rebuild process or  array status updates which I really wanted to investigate. Till that time, I wasn’t event knowing that ‘mdadm –monitor’ only sends you the critical updates. So, I pulled up man pages and saw these are critical events:

  • DeviceDisappeared
  • Fail
  • FailSpare
  • DegradedArray

Rest of the events are not reported at all! Also, that RHEL5’s mdadm package has pre-compiled template of email that mdadm sends upon occurrence of a critical event which I wanted to change from as well cause it looks pretty immature:

This is an automatically generated mail message from mdadm running on HOSTNAME
A DegradedArray event had been detected on md device /dev/md1.
Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:
bla bla bla

Seriously, it says “faithfully”… wth? Lol. We know that all machines are faithful to a human unless they’re not broken or gay! :D It definitely needed to be changed. Checking /etc/init.d/mdmonitor at least gave an idea that its not something changeable but it uses default template when MAILADDR is specified while it doesn’t when PROGRAM parameter is used in /etc/mdadm.conf by passing on RAID array as arguments to the script which is used, instead.

I did this then.


# mdadm --detail --scan >> /etc/mdadm.conf

# echo "PROGRAM /etc/raidalerter" >> /etc/mdadm.conf
# sed -e '1i\DEVICE partitions' -i  /etc/mdadm.conf
# cat /etc/raidalerter    (create this file with below script)

#!/bin/bash
echo -e "Likely an unfavourable or a bad thing just happened to your RAID. Even if its recovering, it was a bad thing which caused this! \n\n\n" $(cat -A /proc/mdstat | sed 's/\$/\\n/g') | mail -s "$1 on $2 $3 at $HOSTNAME" some-mail-address@example.com

# chmod +x /etc/raidalerter
# service mdmonitor restart

Provided that you’ve an MTA working fine, mails would be delivered upon any of RAID incidents to the maximum verbosity possible. I don’t think that any of the hardware raids does so?!
I then tested it on a small array to make sure that alerts are deliverable.


# mdadm /dev/md0 -f /dev/sdb1 -r /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
mdadm: hot removed /dev/sdb1
# mdadm /dev/md0 -a /dev/sdb1
mdadm: re-added /dev/sdb1

Preview:

Subject: RebuildFinished on /dev/md0 at ToughGuy
Likely an unfavorable or a bad thing just happened to your RAID. Even if its recovering, it was a bad thing which caused this! Personalities :

[raid1]
md1 : active
raid1 sdb3[1] sda3[0]
724555520 blocks [2/2] [UU]
md0 : active
raid1 sdb1[1] sda1[0]
4008064 blocks [2/2] [UU]
unused devices: <none>


Mar 9 2010

Linux System Variables

Ever wanted to list down all of system built-in global or local variables stored for your shell? Well, it can be with ‘env‘ and ‘set‘ commands.
The env lists global variables and set lists local ones. Difference between the two is that, global variables are built-in into any shell while local variables include the ones which are set by different applicatons. Such as MAILCHECK (which controls mail checking frequency and informs shell prompt when new mail arrives), only appears in ’set’ command’s output.


Mar 8 2010

Arabian Beer Goggles

Lol, just kidding. Its about a new beer I’ve ‘discovered’ after being in search for an alternative to Arabian Moose Beer for a long time since it had gone missing speaking of scarcity of brewery brands no more than few available here, at the place I live!
Its called Barbican and keeps some history with it!

Its pretty awesome and a friend of mine just made it more awesome. She said, “Barbi Can!“.
Lol, what a name!


Mar 3 2010

Expanding the C: drive (system boot partition)

So, I ran out of space on system partition in one of my primary xen virtual machines. Yea, things like this happen quite often when I literally underestimate myself. Unlike Linux, in this case there’s no power of init which lets you expand an LVM or move a system partition to new disk even without going through any reboots. I guess Microsoft realised that its an important option Windows should have so they provided in Windows Server 2008 and Windows 7 under Disk Management with an on the fly ability to either shrink or expand a system volume. But still its a painful risky process in XP or Server 2003. I’m familiar with third party softwares that help in resizing the partitions including GParted but like always I like to follow vendor supported methodologies on production machines. And it was ‘diskpart’ here. Booting the system from a Server 2008 / Vista DVD’s recovery tools or from WinPE, you can use diskpart. But first things first – there are three requirements you must have before going ahead.

  1. Free space should exist contiguous right after the system partition
  2. That free space partition must be of ‘primary’ type and must not be a logical partition.
  3. It should also be in ‘unallocated’ or deleted form without an existence of a ‘drive’ on it.

I had 10Gig C: and D: drive on a 20Gig of a disk. Added 10 more from XenCenter totalling into 30. As I needed a primary unallocated partition after C: drive so I had to use robocopy to backup the D: drive’s data into a network samba share, format and then split it into one primary and one extended partition.

Legends:

Red = Total system drive space before and after the expand
Blue = Total free space on disk before and after expand
Green = Commands issued.


Feb 6 2010

Error Logging to Console

Sometimes when some applications start logging to console, it can really bleed your eyes when something wrong happens. Not mentioning about standard output or standard errors which are rather easily controllable but it could be anything else invoked by kernel such as iptables logging when you  would like to log any rules to a log file or default syslog. But “hideously” such applications log to system console too and can surprise you when you plug monitor and see a messy mesh of error loggings all the way back and forth, specifically when a machine is virtual – of course in that case VM would be unresponsive ;)

I was trying to look into syslog.conf to control but found out that this is not what a syslog daemon has dominance over; in fact, needs kernel-level  logging modified. And this can be done through either of these ways.

Instruct kernel to log only “errors” instead of informational notifications.

Controlled by ‘printk’ kernel parameter. Defaults are:

# sysctl -a | grep printk
kernel.printk_ratelimit_burst = 10
kernel.printk_ratelimit = 5
kernel.printk = 6    4    1    7
# find /proc -name '*printk*' -exec cat {} \;
6    4    1    7

Just change ‘6 4 1 7′ to ‘3 4 1 7′.

# echo "kernel.printk = 3 4 1 7" >> /etc/sysctl.conf
# sysctl -p

Documentation goes over here, if you’re intrested /usr/share/doc/kernel-doc-2.6.18/Documentation/sysctl/kernel.txt

printk:

The four values in printk denote:
 console_loglevel,
 default_message_loglevel,
 minimum_console_loglevel and
 default_console_loglevel respectively.

These values influence printk() behavior when printing or
 logging error messages. See 'man 2 syslog' for more info on
 the different loglevels.

- console_loglevel: messages with a higher priority than this will be printed to the console
 - default_message_level: messages without an explicit priority will be printed with this priority
 - minimum_console_loglevel: minimum (highest) value to which console_loglevel can be set
 - default_console_loglevel: default value for console_loglevel

Definitions of kernel logging numbers, from Syslog’s manual:

#define KERN_EMERG    "<0>"  /* system is unusable               */
 #define KERN_ALERT    "<1>"  /* action must be taken immediately */
 #define KERN_CRIT     "<2>"  /* critical conditions              */
 #define KERN_ERR      "<3>"  /* error conditions                 */
 #define KERN_WARNING  "<4>"  /* warning conditions               */
 #define KERN_NOTICE   "<5>"  /* normal but significant condition */
 #define KERN_INFO     "<6>"  /* informational                    */
 #define KERN_DEBUG    "<7>"  /* debug-level messages             */

Change kernel ring buffer logging level

Just keep in mind, using this approach only limits console logging and issuing dmesg would still print the over all logs.

# dmesg -n 3
or
# dmesg -n 4

Jan 6 2010

HA LB Cluster on CentOS5 – Without actual heartbeat :P

Last month I wrote a howto for highly available load balanced Piranha cluster using Red Hat’s cluster suite. Until then it was quite not obvious why one should use the Debian styled network load balanced cluster in the production environment when actual “heartbeat” package and service creates a lot of havoc on Red Hat machines. But my reckoning of doing classic things more manually kept me interrogative and I found the flexible way of doing load balanced clustering without needing the actual heartbeat service. Reasons why I’m so much against of having it are numerous:
- Running heartbeat snatches the independence of managing virtual IP addresses on load balancer by hand.
- Thus restricting expansion of the pools!
- Ldirector’s daemon must be managed by heartbeat when its running.
- Waste of resources in utilization; such with a sheer restart of heartbeat service and it just sits on waiting and waiting,…
- And above all, I don’t need a “second” load balancer for a failover. All that glitters is one load balancer running ldirectord in a simple environment and as for the job, it does most of heartbeat’s when acting as a divider and a monitor for distributing requests between web servers.

Environment

Requirements: At least three systems, each with a minimum of one IP (CentOS in my case). Packages ‘heartbeat’, ‘heartbeat-ldirector’ for load balancing and ‘ipvsadm’ for Linux IP Virtual Server. I know you’re thinking that why the ‘heartbeat’ when actually we’re not going to run it. In fact, we’re not going to run it; its just for a dependency resolution, rather a service startup requirement – I should say (/etc/ha.d/shellfuncs is the file needed)! And I swear we won’t run it ;) ! So these are the packages which shape into a project Ultramonkey when combined and it describes the different topologies of a functional HA LB cluster but that’s not our concern, anyway :D (perhaps yours if you think you’ve a bit of free time)

Virtual IP: 12.12.12.60
Load Balancer: 12.12.12.61 aka VM1.
Cluster Nodes/Real Servers:
Web Server1: 12.12.12.62 aka VM2
Web Server2: 12.12.12.63 aka VM3.

And we’ll be using LVS-DR (direct routing) approach for clustering; its most widely used and has lesser downsides.
Lets start by configuring the web servers first.

Cluster Nodes  Configurations

1. On both web servers VM2 and VM3, apache should be running having a common serving file (for purpose of get checked by ldirectord).

# yum install httpd -y
# echo foo > /var/www/html/test.html
# service httpd start
# chkconfig httpd on

And to distinguish both of the web servers during test loading, create at least a one unique file on each of web servers.

[root@VM2 ~]# echo "This is VM2" > /var/www/html/index.html
[root@VM3 ~]# echo "This is VM3" > /var/www/html/index.html

2. Virtual IP needs to be terminated on both web servers so we’ll create a second network interface on each of it. Because eventually all three NICs on all three servers would have to have the same VIP so this would cause a problem with ARP as it resolves MACs against IPs. There are different solutions to this problem. Some may refer to use iptables or arptables_jf. Many would recommend changing default gateway route or hiding the network interface (by the way don’t use iptables or change default gateway for this; Red Hat discourages both of these methods as they cause a lot of overhead). But the most flexible approach I’ve found is:

a. create a loopback interface so it doesn’t communicate with your network gateway/router directly.
b. instruct Linux kernel to announce ARP requests with preference to be taken from local address when matching for communication instead preference from the destination address.
c. instruct Linux kernel to send ARP responses only to the requests originating from same sender address to same local addresses’ subnet. Details here, if you’re really curious about it.

# vi /etc/sysconfig/network-scripts/ifcfg-lo:0
DEVICE=lo:0
IPADDR=12.12.12.60
NETMASK=255.255.255.255
ONBOOT=yes
NAME=loopback
#
# vi /etc/sysctl.conf
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
net.ipv4.conf.eth0.arp_announce = 2
# sysctl -p
# ifup lo:0

Load Balancer Configuration

We’ll be going through:

a. installing required packages
b. enabling IP forwarding,

# yum install heartbeat heartbeat-ldirector ipvsadm -y
# chkconfig --add ldirectord
# chkconfig --del heartbeat
# sed -i 's/net.ipv4.ip_forward = 1/net.ipv4.ip_forward = 0' /etc/sysctl.conf

# sysctl -p

c. configure secondary eth0 for VIP as its going to be exposed to outside world or your local gateway and

# vi /etc/sysconfig/network-scripts/ifcfg-eth0:0
DEVICE=eth0:0
BOOTPROTO=none
ONBOOT=yes
HWADDR=3a:5d:71:ad:67:47
NETMASK=255.255.255.0
IPADDR=12.12.12.60
GATEWAY=12.12.12.1
TYPE=Ethernet

d. then creating ldirector.cf, the configuration file of our load balancer, respectively!!

# vi /etc/ha.d/ldirectord.cf
checktimeout=10
checkinterval=2
autoreload=no
logfile="/var/log/ldirectord.log"
quiescent=no
virtual=12.12.12.60:80
 real=12.12.12.62:80 gate
 real= 12.12.12.63:80 gate
 service=http
 request="test.html"
 receive="foo"
 scheduler=wrr
 protocol=tcp
 checktype=negotiate

# service ldirectord start

Option ‘quiescent’ just removes the real server from ipvs table whom ldirectord doesn’t recieve any response from, when querying for test.html within ten seconds, marking that real server as dead; until its available again. Note that the “gate” switch in ‘real’ server’s parameter value which testifies the usage of LVS Direct Routing method. The rest of the two methods are masq and ipip the details of which along with the other options available, particularly the scheduler parameters, for this configuration file can be found in ‘man ldirectord’.

Testing

Use ‘ipvsadm’ to list down current statistics of ldirectord. Make sure that both real servers IPs are listed there and have non-zero value in weight (since we’ve this default setup, it should be 1). If not, then try checking the log file, tcpdump on ldirector and apache logs on real servers.
If everything works good, you’ll see changing content when browsing to http://12.12.12.60/ multiple times (from another system outside these cluster nodes). Then stop httpd on one web server, browse to the URL again and all requests should now be served from the other web server.

[root@VM1 ~]# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  12.12.12.60:http wrr
-> 12.12.12.63:http             Route   1      0          0
-> 12.12.12.62:http             Route   1      0          0

For a more meaningful testing

$ for i in $(seq 6); do curl http://12.12.12.60/index.html; done
 This is VM3
 This is VM2
 This is VM3
 This is VM2
 This is VM3
 This is VM2

I’ll be posting a couple of optimizations techniques soon when I’ll be getting some more free time. Stay tuned and take care :D

checktimeout=10
checkinterval=2
autoreload=no
logfile=”/var/log/ldirectord.log”
quiescent=yes
virtual=12.12.12.60:80
real=12.12.12.62:80 gate
real= 12.12.12.63:80 gate
service=http
request=”index.html”
receive=”hi”
scheduler=wlc
protocol=tcp
checktype=negotiate

Dec 25 2009

Total Rsync Progress

If you use rsync frequently, you might be aware of the fact that rsync doesn’t show overall or total transfer statistics when you’re syncing directories recursively. Even options like ‘–progress’, ‘–stats’ and ‘-vv’ won’t do that. I was searching for that, was about to write a script to run in dry-run mode and measure an overall rsync progress but found a patch here written by Graeme Humphries. This patch later was incorporated into latest dev version 3.1dev downloadable here at http://samba.anu.edu.au/ftp/rsync/dev/nightly/ with an option invokable by ‘–info=progress2′.

Excerpts from the man pages:

There is also a --info=progress2 option that outputs statistics based on the whole transfer, rather than individual files.  Use this flag without outputting a filename (e.g. avoid -v or specify --info=name0 if you want to see how the transfer is doing without scrolling the screen with a lot of names.  (You don't need to specify the --progress option in order to use --info=progress2.)
 

So, I downloaded, compiled and installed this dev version and guess what now, there’s no creepy scrolls in shell console filling up the screen with individual file progress.
Yea, I know Red Hat and CentOS are slow in updating their packages repository but lets hope when this build is final, Dag’s repo may have an rpm for it.


Dec 24 2009

Easiest way to create selfsigned certificates

For Linux its going to be with tool ‘genkey’ – a part of crypto-utils package available in Red Hat distros.

# genkey servername

And for Windows, easiest way to do is with SelfSSL available in IIS 6.x Resource Tools.


Dec 20 2009

ASCII Art in Linux

I’m fond of two ascii art tools in Linux.

- linux_logo
- figlet

Both of these are available in RPMForge/Dag’s repository. Second one, figlet draws the ascii art for any text that is input. It has a lot of font options available (see man for figlet and figlist).


Dec 19 2009

Installing HPLIP 3.9.10 on CentOS 5.4 for newer printers (HP LaserJet M1120 MFP)

CentOS 5’s base repository has an older version of HPLIP, something about ‘1.6.7′ or so which of course is not adequate to get newer HP printers specially the LaserJet series, to get to work. Now the natural way to have this installed, you may think is to compile it from source – if you’re thinking that then no, that won’t help out! Even after fulfilling all of the required dependencies. I got about almost 14 errors when running hp-check utility after compiling, got’em reduced to 10 but no far lesser than that if you know what I mean.

error: NOT FOUND! This is a REQUIRED/RUNTIME ONLY dependency. Please make sure that this dependency is installed before installing or running HPLIP.
error: NOT FOUND! This is a REQUIRED/RUNTIME ONLY dependency. Please make sure that this dependency is installed before installing or running HPLIP.
warning: NOT FOUND! This is an OPTIONAL/RUNTIME ONLY dependency. Some HPLIP functionality may not function properly.
warning: NOT FOUND! This is an OPTIONAL/RUNTIME ONLY dependency. Some HPLIP functionality may not function properly.
error: NOT FOUND! This is a REQUIRED/COMPILE TIME ONLY dependency. Please make sure that this dependency is installed before installing or running HPLIP.
error: Could not access file: No such file or directory
error: 10 errors and/or warnings.
-----------
| SUMMARY |
-----------
Please refer to the installation instructions at:

http://hplip.sourceforge.net/install/index.html

Pretty insane though, many of these dependencies were already installed. I would assume that this would be the reason why hplip is not under active development for CentOS and why its not current under CentOS as I saw quite a few HP’s devs and techs saying a big “no” to this community based distribution when people complained on their Launchpad about these compilation errors. Plus, the relative hplip installation issues I found on CentOS’ forum.

After being in disappointed (oops wth) situation, I tried running the RHEL5’s rpm (can be downloaded from hplip’s site) on it after removing the source installed version, but it too gave the dependency errors which I hoped I would resolve and I did later on.

Installing……

# rpm -ivh /Raid/hplip-3.9.10_rhel-5.0.i386.rpm
Preparing...                ########################################### [100%]
 file /usr/bin/hpijs from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
 file /usr/lib/libhpip.so.0.0.1 from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
 file /usr/lib/sane/libsane-hpaio.so.1.0.0 from install of hplipfull-3.9.10-0.i386 conflicts with file from package libsane-hpaio-1.6.7-4.1.el5.4.i386

So, I decided to remove problematic hpijs


 Package                 Arch       Version          Repository        Size

Removing:
 hpijs                   i386       1:1.6.7-4.1.el5.4  installed         588 k
Removing for dependencies:
 libsane-hpaio           i386       1.6.7-4.1.el5.4  installed          94 k
 sane-backends           i386       1.0.18-5.el5     installed         3.1 M
 sane-backends-devel     i386       1.0.18-5.el5     installed          27 k
 sane-backends-libs      i386       1.0.18-5.el5     installed         5.2 M
 xsane                   i386       0.991-5.el5      installed         4.5 M

Transaction Summary
Install      0 Package(s)
Update       0 Package(s)
Remove       6 Package(s)

But realised soon that it also removed libsane sub-dependency as well.

# rpm -ivh /Raid/hplip-3.9.10_rhel-5.0.i386.rpm
error: Failed dependencies:
 libsane.so.1 is needed by hplipfull-3.9.10-0.i386
 

Because installing sane would also install hpijs and other conflicting stuff as well so the solution here was to remove problematic packages without ‘removing’ any dependencies needed.

[root@ToughGuy ~]# rpm -ivh /Raid/hplip-3.9.10_rhel-5.0.i386.rpm
Preparing...                ########################################### [100%]
 file /usr/bin/hpijs from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
 file /usr/lib/libhpip.so.0.0.1 from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
 file /usr/lib/sane/libsane-hpaio.so.1.0.0 from install of hplipfull-3.9.10-0.i386 conflicts with file from package libsane-hpaio-1.6.7-4.1.el5.4.i386
#
# rpm -ev --nodeps libsane-hpaio
# rpm -ivh /Raid/hplip-3.9.10_rhel-5.0.i386.rpm
Preparing...                ########################################### [100%]
 file /usr/bin/hpijs from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
 file /usr/lib/libhpip.so.0.0.1 from install of hplipfull-3.9.10-0.i386 conflicts with file from package hpijs-1.6.7-4.1.el5.4.i386
#
# rpm -ev --nodeps hpijs
#
# rpm -ivh /Raid/hplip-3.9.10_rhel-5.0.i386.rpm
Preparing...                ########################################### [100%]
 1:hplipfull              ########################################### [100%]
#

Concluding the overall steps:

# yum install cups cups-devel ghostscript* PyQt xsane -y
# Download and install hplip-3.9.10_rhel-5.0.i386.rpm from http://hplipopensource.com/hplip-web/install_wizard/index.html choosing RHEL5.
# rpm -ev --nodeps libsane-hpaio
# rpm -ev --nodeps hpijs
# rpm -ivh hplip-3.9.10_rhel-5.0.i386.rpm
# Reboot the system if you're lucky enough, you'll see no errors
# reboot
# system-config-printer

And configure the printer now as usual. Just out of curiosity, this was my XenServer where I installed it (yea I know it would sound funny) and I got scanner (LaserJet M1120 is dual scanner and printer) working fine as well with xsane. Check it out :D

Scan Test HP LaserJet M1120 MFP

NOTE: If this post helped you out or provided you with ways of troubleshooting, feel free to say a little thanks ;)