T
ECH
N
OTE
Testing HA Resources and Resource Groups
The Red Hat High Availability (HA) Add-On (née Cluster Suite) is a stable service failover
platform with easy-to-configure resources and resource (service) groups. Troubleshooting
resources can be very easy to accomplish, using the essential rg_test command.
Troubleshooting may appear complex to administrators who are new to cluster management. Numerous software component layers
(e.g., shared storage, multipathing, internode protocols, cluster management, volumes, filesystems, applications) contribute to
successful serving of a single resource group. Initial setup and configuration of HA nodes, fencing, quorum, domains and storage
comprise the crucial foundation of a stable failover cluster, supplemented by workloads made of resources and resource groups.
This Tech Note introduces rg_test, a command for testing resource and resource group configuration, e.g., resource agent methods,
start and stop order, and dependency trees. Begin with the rules subcomand to list recognized default and custom resource agent types
and operations. This short example displays active resource agent names only; a complete listing is more informative.
[root@node1 ~]# rg_test rules | grep "Resource Rules"
Running in rules mode.
Loaded 24 resource rules
Resource Rules for "apache"
Resource Rules for "ASEHAagent"
Resource Rules for "clusterfs"
Resource Rules for "fs"
Resource Rules for "ip"
Resource Rules for "lvm"
Resource Rules for "mysql"
Resource Rules for "named"
Resource Rules for "netfs"
Resource Rules for "nfsclient"
Resource Rules for "nfsexport"
Resource Rules for "nfsserver"
Resource Rules for "openldap"
Resource Rules for "oracledb"
Resource Rules for "orainstance"
Resource Rules for "oralistener"
Resource Rules for "postgres-8"
Resource Rules for "samba"
Resource Rules for "SAPDatabase"
Resource Rules for "SAPInstance"
Resource Rules for "script"
Resource Rules for "service"
Resource Rules for "tomcat-6"
Resource Rules for "vm"
Again using the rules subcommand, validate the saved cluster configuration. The command detects misconfigured or unrecognized
resource agents. If the command does not generate output, then no syntax or parsing errors were detected.
[root@node1 ~]# rg_test rules /etc/cluster/cluster.conf
The noop subcommand use the same ordering rules and dependancy tree parsing as rgmanager, but does not perform live resource
actions or state transitions. Use noop to confirm that a resource group is structurally correct. The successful ending message is the
same as for a normal service start, but here indicates only that parsing has completed. Test noop with stop service, also.
[root@node1 ~]# rg_test noop /etc/cluster/cluster.conf start service apache
Starting apache...
[start] service:apache
[start] lvm:havg
[start] fs:wwwfs
[start] ip:172.16.0.100/24
[start] apache:httpd
Start of apache complete
Individual resources can be noop tested, demonstrating resource recognition and start and stop ordering if the resource is a parent-
child dependancy member. Syntax requires using the correct resource agent type, as in the following start lvm example.
Testing HA resources and service groups with rg_test
July 8, 2013
Page 1
T
ECH
N
OTE
Testing HA Resources and Resource Groups
The Red Hat High Availability (HA) Add-On (née Cluster Suite) is a stable service failover
platform with easy-to-configure resources and resource (service) groups. Troubleshooting
resources can be very easy to accomplish, using the essential rg_test command.
[root@node1 ~]# rg_test noop /etc/cluster/cluster.conf start lvm havg
Starting havg...
[start] lvm:havg
[start] fs:wwwfs
Start of havg complete
The test subcommand actually starts (i.e., configures and runs) or stops (shuts down) the resource or resource group. This is a helpful
technique for debugging service failures before production rollout. Ensure that the service group is disabled or frozen during testing,
to stop rgmanager from detecting resource transitions and attempting to restart resources or relocate the service. If the service
remains enabled during testing, rg_manager may repetitively restart components until the service is deemed unstable. Also, rg_test
only starts resource groups locally, on the same node where the command is being run. No failover domain logic is invoked.
[root@node1 ~]# clusvcadm -d apache
# or
clusvcadm -Z apache
to suspend the service.
Member node1 trying to disable httpd:apache...Success
[root@node1 ~]# rg_test test /etc/cluster/cluster.conf start service apache
Starting apache...Starting apache...
<info> mounting /dev/dm-10 on /var/www/html
[fs] mounting /dev/dm-10 on /var/www/html
<err> mount -t ext4 /dev/dm-10 /var/www/html
[fs] mount -t ext4 /dev/dm-10 /var/www/html
<debug> Link for eth1: Detected
[ip] Link for eth1: Detected
<info> Adding IPv4 address 172.16.0.100/24 to eth1
[ip] Adding IPv4 address 172.16.0.100/24 to eth1
<debug> Pinging addr 172.16.0.100 from dev eth1
[ip] Pinging addr 172.16.0.100 from dev eth1
<debug> Sending gratuitous ARP: 172.16.0.100 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
[ip] Sending gratuitous ARP: 172.16.0.100 52:54:00:00:00:01 brd ff:ff:ff:ff:ff:ff
<debug> Verifying Configuration Of apache:httpd
[apache] Verifying Configuration Of apache:httpd
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
<debug> Monitoring Service apache:httpd
[apache] Monitoring Service apache:httpd
<error> Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed
[apache] Checking Existence Of File /var/run/cluster/apache/apache:httpd.pid [apache:httpd] > Failed
<error> Monitoring Service apache:httpd > Service Is Not Running
[apache] Monitoring Service apache:httpd > Service Is Not Running
<info> Starting Service apache:httpd
[apache] Starting Service apache:httpd
<debug> Looking For IP Addresses
[apache] Looking For IP Addresses
<debug> 1 IP addresses found for apache/httpd
[apache] 1 IP addresses found for apache/httpd
<debug> Looking For IP Addresses > Succeed - IP Addresses Found
[apache] Looking For IP Addresses > Succeed - IP Addresses Found
<debug> Checking: SHA1 checksum of config file /apache/apache:httpd/httpd.conf
[apache] Checking: SHA1 checksum of config file /apache/apache:httpd/httpd.conf
<debug> Checking: SHA1 checksum > succeed
[apache] Checking: SHA1 checksum > succeed
<debug> Generating New Config File /apache/apache:httpd/httpd.conf From /etc/httpd/conf/httpd.conf
[apache] Generating New Config File /apache/apache:httpd/httpd.conf From /etc/httpd/conf/httpd.conf
<debug> Generating New Config File /apache/apache:httpd/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
[apache] Generating New Config File /apache/apache:httpd/httpd.conf From /etc/httpd/conf/httpd.conf > Succeed
<debug> Starting Service apache:httpd > Succeed
[apache] Starting Service apache:httpd > Succeed
Start of apache complete
Testing HA resources and service groups with rg_test
July 8, 2013
Page 2
T
ECH
N
OTE
Testing HA Resources and Resource Groups
The Red Hat High Availability (HA) Add-On (née Cluster Suite) is a stable service failover
platform with easy-to-configure resources and resource (service) groups. Troubleshooting
resources can be very easy to accomplish, using the essential rg_test command.
Here is rg_test output when the requested service is already started on this node, e.g., by either an earlier rg_test invocation or
because the resource group is still enabled. The following example demonstrates that each resource is checked to see if it is already
configured or started, as appropriate, and evaluated as successful before continuing with the next resource.
[root@node1 ~]# rg_test test /etc/cluster/cluster.conf start service apache
Starting apache...
<debug> /dev/dm-10 already mounted
[fs] /dev/dm-10 already mounted
<debug> 172.16.0.100/24 already configured
[ip] 172.16.0.100/24 already configured
<debug> Verifying Configuration Of apache:httpd
[apache] Verifying Configuration Of apache:httpd
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
<debug> Monitoring Service apache:httpd
[apache] Monitoring Service apache:httpd
<debug> Monitoring Service apache:httpd > Service Is Running
[apache] Monitoring Service apache:httpd > Service Is Running
Start of apache complete
Commonly, rg_test is useful when a service has failed to start after being configured and enabled. The level of output detail assists in
pinpointing the troublesome resource component. As an example, here is output when the previous apache service remains running
on node1 while rg_test attempts to test the service's ip address resource on node2. An IP address may only exist once in a subnet,
therefore using the ip resource agent's status method (ping) to check for address availabity before use causes the test to (properly) fail.
Notice that the syntax requires using the configured resource's name (172.16.0.100/24), which matches the ip address it configures.
[root@node2 ~]# rg_test test /etc/cluster/cluster.conf start ip 172.16.0.100/24
Starting 172.16.0.100/24...
<debug> Link for eth1: Detected
[ip] Link for eth1: Detected
<info> Adding IPv4 address 172.16.0.100/24 to eth1
[ip] Adding IPv4 address 172.16.0.100/24 to eth1
<debug> Pinging addr 172.16.0.100 from dev eth1
[ip] Pinging addr 172.16.0.100 from dev eth1
<err> IPv4 address collision 172.16.0.100
[ip] IPv4 address collision 172.16.0.100
Failed to start 172.16.0.100/24
Similarly, here is output for an rg_test test of the lvm resource named havg while that HA-LVM logical volume remains in use
elsewhere, i.e., activated on node1. Since HA-LVM is designed for use on only one node at a time (underneath ext3, ext4 and XFS),
the logical volume fails to activate on this additional node. Clustered LVM, the all-nodes-at-a-time method, allows simultaneous
logical volume activation on all nodes (appropriate only for GFS2) and would not fail in this scenario
[root@node2 ~]# rg_test test /etc/cluster/cluster.conf start lvm havg
Starting havg...
Error locking on node node2.cluster.com: Volume is busy on another node
<err> Failed to activate logical volume, havg/halv
[lvm] Failed to activate logical volume, havg/halv
<notice> Attempting cleanup of havg/halv
[lvm] Attempting cleanup of havg/halv
Error locking on node node2.cluster.com: Volume is busy on another node
<err> Failed second attempt to activate havg/halv
[lvm] Failed second attempt to activate havg/halv
Failed to start havg
Testing HA resources and service groups with rg_test
July 8, 2013
Page 3
T
ECH
N
OTE
Testing HA Resources and Resource Groups
The Red Hat High Availability (HA) Add-On (née Cluster Suite) is a stable service failover
platform with easy-to-configure resources and resource (service) groups. Troubleshooting
resources can be very easy to accomplish, using the essential rg_test command.
On node1, check the running service using status service, which uses the same resource agent monitoring methods as rgmanager.
[root@node1 ~]# rg_test test /etc/cluster/cluster.conf status service apache
Checking status of apache...
[fs] fs:wwwfs
[ip] ip:172.16.0.100/24
[apache] apache:httpd
Status of apache is good
Next, properly stop the service on node1. Similar to service startup, test output is detailed enough to be useful for debugging.
[root@node1 ~]# rg_test test /etc/cluster/cluster.conf stop service apache
Stopping apache...
<debug> Verifying Configuration Of apache:httpd
[apache] Verifying Configuration Of apache:httpd
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf
<debug> Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
[apache] Checking Syntax Of The File /etc/httpd/conf/httpd.conf > Succeed
<info> Stopping Service apache:httpd
[apache] Stopping Service apache:httpd
<info> Stopping Service apache:httpd > Succeed
[apache] Stopping Service apache:httpd > Succeed
<info> Removing IPv4 address 172.16.0.100/24 from eth1
[ip] Removing IPv4 address 172.16.0.100/24 from eth1
<info> unmounting /var/www/html
[fs] unmounting /var/www/html
Stop of apache complete
Start the service on node2 to confirm that it also runs successfully there, now that resource conflicts have been eliminated. However,
there appears to be a configuration problem, a missing httpd.conf file. Use normal system troubleshooting techniques to locate the
error. After correction, re-run rg_test test, using either start service again or stop service before start service. As seen previously,
restarting while started succeeds, since each resource is successfully evaluated and skipped if already configured or running.
[root@node2 ~]# rg_test test /etc/cluster/cluster.conf start service apache
Starting apache...
<info> mounting /dev/dm-10 on /var/www/html
[fs] mounting /dev/dm-10 on /var/www/html
<err> mount -t ext4 /dev/dm-10 /var/www/html
[fs] mount -t ext4 /dev/dm-10 /var/www/html
<debug> Link for eth1: Detected
[ip] Link for eth1: Detected
<info> Adding IPv4 address 172.16.0.100/24 to eth1
[ip] Adding IPv4 address 172.16.0.100/24 to eth1
<debug> Pinging addr 172.16.0.100 from dev eth1
[ip] Pinging addr 172.16.0.100 from dev eth1
<debug> Sending gratuitous ARP: 172.16.0.100 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
[ip] Sending gratuitous ARP: 172.16.0.100 52:54:00:00:00:02 brd ff:ff:ff:ff:ff:ff
<debug> Verifying Configuration Of apache:httpd
[apache] Verifying Configuration Of apache:httpd
<error> Checking Existence Of File /etc/httpd/conf/httpd.conf [apache:httpd] > Failed - File Is Not Readable
[apache] Checking Existence Of File /etc/httpd/conf/httpd.conf [apache:httpd] > Failed - File Is Not Readable
Failed to start apache
Finally, use rg_test to view resource tree differences between cluster.conf files versions. Augment an enterprise change management
policy by including a resource group delta listing or simply view changes made since last archiving the cluster configuration file.
[root@node1 ~]# rg_test delta /etc/cluster/cluster.conf /etc/cluster/cluster.conf.20130708
Testing HA resources and service groups with rg_test
July 8, 2013
Page 4