Dear Admins, 
I have not received any suggestions yet on the following problem.  
If you have any suggestions, could you please tell us? 
The cluster configuration:
(It is connected to two networks, {C0,A0,B0} and {C1,A1,B1}.)
  Machine           Name   Ethernet      Fddi          Memory channel
                           (ip-address)  (ip-address)  (ip-address)
  Cluster alias     C       C,C0          C1
  Member 1          A       A0            A1            Am
  Member 2          B       B0            B1            Bm
Some further information: 
11) Although I wrote that X11 is working, I made a mistake in testing 
it, and X11 from machine B to addresses Bx hangs as rsh etc.  It 
seems that the services that have out_alias attribute in 
/etc/clua_services have problems. 
12) Machine A and B have somewhat different configurations.  Their 
primary and secondary network interfaces are 
  member  primary     secondary
  A       ee0 (A0)    fta0 (A1)
  B       fta0 (B1)   tu0 (B0)
Here, "primary" means that it is listed first in "netstat -i".
13) gated log (/var/tmp/gated.log) on "B" contains wrong entry 
  ADD  "C"  255.255.255.255 gw "C1"  Kernel  pref 254/0 metric 0/0 "fta0" <NoAdvise Ext Active Gateway>
which should have been
  ADD  "C"  255.255.255.255 gw "C"   Kernel  pref 254/0 metric 0/0 "tu0" <NoAdvise Ext Active Gateway>
and the corresponding "netstat -r" outputs on both "A" and "B" are
  "C"     localhost          UH         55 94645282  lo0       
???
Compaq support will investigate it more on this Thursday.  But if 
you have any suggestions, could you please tell us? 
>>> On Sat, 12 May 2001 10:43:37 JST,  Kazuro FURUKAWA <kazuro.furukawa_at_kek.jp>  wrote;
> Dear Admins,
> 
> We're experiencing some odd behavior with V5.1 TruCluster after the 
> patch T64V51AS0003-20010413 was applied by Compaq support (even after 
> another reboot). 
> 
> Here is the cluster configuration. 
>  node   name    ether   fddi    memchan
>  alias  C       C,C0    C1
>  1      A       A0      A1      Am
>  2      B       B0      B1      Bm
> 
> First, we noticed that rsh from B to B hangs, while ftp from B to B 
> works.  Then we found these interesting symptoms while we are waiting 
> for a timing to reboot. 
> 
> 0)  almost all network activities are working well.  for example 
>     any access from A to Ax works.  (Ax means A0, A1, Am or localhost)
> 1)  rsh, rlogin, telnet from B to Bx hangs. (Ctrl-C can kill it.)
>     (Here Bx means B0, B1, Bm or localhost)
> 1') ping from B to Bx works. 
> 2)  ping from B to C0 or C gets no reply. 
> 2') ping from B to C1 works.
> 3)  traceroute from B to C shows 30 lines of gateways of "localhost". 
>     packets are looping?
> 4)  ftp, rup, X11, smtp from B to Bx works.
> 5)  rsh, telnet, ftp from B to C0 always goes to A. 
> 5') rsh, telnet, ftp from A to C is following round-robin rule. 
> 6)  rsh, telnet, ftp from B to C1 works 3 times, then hangs 3 times. 
> 7)  rup, rusers from other machines to C always goes to B.
> 7') rsh from other machines to C is following round-robin rule. 
> 8)  rup, rusers from B to C hangs. 
> 8') rup, rusers from B to Bx works. 
> 9)  rup, rusers from B to C1 always goes to B. 
> 10) netstat -i, -r on A and B do not show any noticeable differences. 
> 
> I first suspected out_alias attribute in /etc/clua_services.  But X11 
> works well with out_alias.  (I didn't change clua_services yet.) 
> 
> We tried these
>  /sbin/init.d/gateway stop
>  /sbin/init.d/gateway start
>  cluamgr -r start
>  cfsmgr
>  kill -HUP {inetd}
>  /usr/sbin/sysman net_wizard
>  finally rebooting B
> 
> Those did not cure the symptoms. 
> Compaq support does not provide further suggestions yet. 
> 
> Could someone help us? 
> 
> Regards. 
-----
Kazuro FURUKAWA
 Linac,  High Energy Accelerator Research Organization (KEK), Japan
Received on Mon May 21 2001 - 09:02:50 NZST