Dear Admins,
We're experiencing some odd behavior with V5.1 TruCluster after the 
patch T64V51AS0003-20010413 was applied by Compaq support (even after 
another reboot). 
Here is the cluster configuration. 
 node	name	ether	fddi	memchan
 alias	C	C,C0	C1
 1	A	A0	A1	Am
 2	B	B0	B1	Bm
First, we noticed that rsh from B to B hangs, while ftp from B to B 
works.  Then we found these interesting symptoms while we are waiting 
for a timing to reboot. 
0)  almost all network activities are working well.  for example 
    any access from A to Ax works.  (Ax means A0, A1, Am or localhost)
1)  rsh, rlogin, telnet from B to Bx hangs. (Ctrl-C can kill it.)
    (Here Bx means B0, B1, Bm or localhost)
1') ping from B to Bx works. 
2)  ping from B to C0 or C gets no reply. 
2') ping from B to C1 works.
3)  traceroute from B to C shows 30 lines of gateways of "localhost". 
    packets are looping?
4)  ftp, rup, X11, smtp from B to Bx works.
5)  rsh, telnet, ftp from B to C0 always goes to A. 
5') rsh, telnet, ftp from A to C is following round-robin rule. 
6)  rsh, telnet, ftp from B to C1 works 3 times, then hangs 3 times. 
7)  rup, rusers from other machines to C always goes to B.
7') rsh from other machines to C is following round-robin rule. 
8)  rup, rusers from B to C hangs. 
8') rup, rusers from B to Bx works. 
9)  rup, rusers from B to C1 always goes to B. 
10) netstat -i, -r on A and B do not show any noticeable differences. 
I first suspected out_alias attribute in /etc/clua_services.  But X11 
works well with out_alias.  (I didn't change clua_services yet.) 
We tried these
 /sbin/init.d/gateway stop
 /sbin/init.d/gateway start
 cluamgr -r start
 cfsmgr
 kill -HUP {inetd}
 /usr/sbin/sysman net_wizard
 finally rebooting B
Those did not cure the symptoms. 
Compaq support does not provide further suggestions yet. 
Could someone help us? 
Regards. 
-----
Kazuro FURUKAWA
 Linac,  High Energy Accelerator Research Organization (KEK), Japan
Received on Sat May 12 2001 - 01:44:51 NZST