Availability Manager Version 1.4 Release Notes The following notes address late-breaking information and known problems for the Availability Manager Version 1.4. These notes appear in the following categories: o Corrections, known problems, and new and changed features in Version 1.4 o Configuration, setup, and installation notes o Startup and shutdown notes o Operation notes o Display notes 1 Corrections, Known Problems, and New and Changed Features in Version 1.4 The following sections discuss key problems that have been corrected and that remain in Version 1.4. Changes and new features in this release are also described. 1.1 Problems Corrected Since Version 1.3 This section discusses key problems that have been corrected in Version 1.4. 1.1.1 Patch Kits No Longer Needed on OpenVMS The patch kit required in Version 1.3 of the Availability Manager is not required in Version 1.4. In Version 1.3, a POLYCENTER Software Installation Utility patch kit is required for versions of OpenVMS prior to Version 7.1, for Alpha and VAX systems: DEC-AXPVMS-VMS62TO71_PCSI-V0100--4.PCSI DEC-VAXVMS-VMS62TO71_PCSI-V0100--4.PCSI These kits are available at the following websites, for Alpha and VAX computers, respectively: ftp://ftp.service.digital.com/public/vms/axp/v6.2/ ftp://ftp.service.digital.com/public/vms/vax/v6.2/ 1 1.1.2 System Failures on OpenVMS Nodes That Are Being Monitored Earlier versions of the OpenVMS Data Collector (RMDRIVER) could infrequently cause system failures when they received certain types of truncated network packets. Transient network problems and software bugs sometimes truncated these packets, and the safeguards in RMDRIVER to detect this problem proved to be inadequate. The Availability Manager Version 1.4 OpenVMS Data Collector contains more safeguards to prevent the system from failing under these circumstances. 1.1.3 Long Delays in Discovering Data Collector Nodes on Small LANs A software bug in the Version 1.3 Data Analyzer often resulted in delays of 10-20 minutes for nodes to be discovered on small LANs. This problem has been corrected in the Availability Manager Version 1.4 Data Analyzer, which now detects and displays Data Collector nodes noticeably faster. 1.1.4 User with Inadequate Page File Quota Cannot Run OpenVMS Data Analyzer If a user with inadequate page file quota (PGFLQUOTA) tries to run the Availability Manager Data Analyzer on OpenVMS, an error message is displayed and the application stops. Inadequate PGFLQUOTA causes unusual behavior in the OpenVMS Java Virtual Machine, preventing the Availability Manager from starting and running normally. Please refer to the OpenVMS Installation Instructions for the appropriate PGFLQUOTA settings. 1.2 Problems Remaining in Version 1.4 This section discusses known problems in Version 1.4. 1.2.1 Page and Swap File Names in Event List Display If page and swap file events are signaled before the Data Analyzer has resolved their file names from the file ID (FID), events such as LOPGSP display the FID instead of the file name information. You can determine the file name for the FID by checking the File Name field in the I/O Page Swap Files page. The FID for the file name is displayed after the file name. 2 1.2.2 Events Sometimes Displayed After Background Collection Stops On both OpenVMS and Windows systems, the Data Analyzer sometimes displays events after users customize their systems to stop collecting a particular kind of data. This is most likely to occur when the Data Analyzer is monitoring many nodes. Under these conditions, a data handler sometimes clears events before all pending packets have been processed. The events based on the data in these packets are displayed even though users have requested that this data not be collected. 1.2.3 Data Analyzer Might Not Recognize Impromptu Operating System Upgrades If the Availability Manager Data Analyzer is monitoring an OpenVMS node that is shut down and then restarted with a different version of the operating system, the Data Analyzer does not recognize the change. Displays for this node continue to show the previous operating system version, and data collection for this node might also be affected. 1.3 Changes and New Features in Version 1.4 This section discusses the changes and new features in Version 1.4. 1.3.1 New Process States Added The following table lists the new process states that have been added to the Availability Manager Version 1.4. These process states are shown on the CPU Process Summary page and on the Process Information Page of the single process displays. (All of these process states were previously included in MWAIT.) ___________________________________________________________ Process_State____Definition________________________________ BYTLM Wait[1] Process waiting for buffered I/O byte count quota. JIB Wait[1] Process in either BYTLM Wait or TQELM Wait state. [1]Previously_included_in_MUTEX_value._____________________ 3 ___________________________________________________________ Process_State____Definition________________________________ TQELM Wait[1] Process waiting for timer queue entry quota. EXH Kernel thread in exit handler. INNER_MODE Kernel thread waiting to acquire inner- mode semaphore. PSXFR Process waiting during a POSIX fork operation. [1]Previously_included_in_MUTEX_value._____________________ ___________________________________________________________ 1.3.2 New Process States Reflected in Wait States Page On the Wait States page for a single process display, states that have been reflected in the "Control" value prior to Version 1.4 are CEF, MWAIT, LEF, LEFO, RWAST, RWMBX, RWSCS, RWCLU, RWCSV, RWUNK, and LEF waiting for an ENQ. The following states, introduced in Version 1.4, are now also reflected in the "Control" value: BYTLM Wait, INNER_ MODE, JIB Wait, PSXFR, and TQELM Wait. 1.3.3 Definition of PRCMWT Event Changed Additions have been made to the investigation hint for the "Process waiting in MWAIT" (PRCMWT) event, which now reads as follows: "Various resource wait states are part of the collective wait state called MWAIT. See Appendix A in the Availability Manager User's Guide for a list of these states. The state the process is in is displayed on the CPU Process page and the Single Process page. Check the Single Process pages to determine which resource the process is waiting for and whether the resource is still available for the process." 1.3.4 New Events Signaled A number of new events are signaled in Version 1.4. The following table lists the type of data collection that can produce the event, the abbreviation of the event, and a short description of the event: 4 ___________________________________________________________ Type of Data Collection_______Event_______Description___________________ Single process KTHIMD Kernel thread waiting for inner-mode semaphore. Single process PRCPSX Process waiting in PSXFR wait state. Fix-generated FXUERR Unknown error code for fix. Node-level_______PKTFER______Packet_format_error.__________ These events are explained further in Appendix B of the Availability Manager User's Guide. 1.4 Additional Information This section contains a note with additional information. 1.4.1 Recognizing a System Failure Forced by the Availability Manager Because a user with suitable privileges can force a node to fail from the Data Analyzer by using the "Crash Node" fix, system managers have requested a method for recognizing these particular failure footprints so that they can distinguish them from other failures. These failures all have identical footprints: they are operator-induced system failures in kernel mode at IPL 8. The top of the kernel stack is similar the following display: SP => Quadword system address Quadword data 1BE0DEAD.00000000 00000000.00000000 Quadword data TRAP$CRASH Quadword data SYS$RMDRIVER + offset 2 Configuration, Setup, and Installation Notes The following notes pertain to configuring, setting up, and installing the Availability Manager. 5 2.1 Recommended Hardware Configurations There are no minimum hardware requirements for the Data Collector. Compaq recommends using, at a minimum, one of the following hardware configurations on systems running the Data Analyzer: ___________________________________________________________ System______________Hardware_______________________________ Windows NT/Windows 300 MHz Intel Pentium processor with 96 2000 MB of memory Windows NT 500 MHz Alpha processor with 128 MB of memory OpenVMS 500 MHz Alpha processor with 128 MB of ____________________memory_________________________________ 2.2 Notes on Installing the Data Analyzer on OpenVMS Systems The following notes pertain to the installation of the Availability Manager Data Analyzer on OpenVMS systems. 2.2.1 Enabling and Disabling Kernel Multithreading On multiple-CPU OpenVMS systems, the logical name AMDS$AM_MULTITHREADING controls whether or not the Availability Manager runs on multiple CPUs (that is, whether it uses kernel multithreading). This logical name is defined in the SYS$MANAGER:AMDS$LOGICALS.COM file. Setting AMDS$AM_MULTITHREADING to TRUE can improve application performance, but at the cost of application stability. See Section 4.2.3 for an example of one stability problem. Setting the logical name to FALSE (the default) forces the application to run on a single CPU. For the current set of patches available on OpenVMS, this approach offers the greatest stability. Enabling and Disabling Commands To enable kernel multithreading, set the logical to TRUE: $ AMDS$DEF AMDS$AM_MULTITHREADING TRUE To disable kernel multithreading, set the logical to FALSE: $ AMDS$DEF AMDS$AM_MULTITHREADING FALSE 6 2.2.2 PCSI Installation Messages If you install DECamds Version 7.3 after installing Availability Manager Version 1.4, you might see any of the following PCSI messages: %PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.EXE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.STB was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.COM was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.TEMPLATE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSEXE]AMDS$RMCP.EXE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, module AVAIL was not replaced because module from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.DAT was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.TEMPLATE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.COM was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.TEMPLATE was not replaced because file from kit does not have higher generation number These messages are to be expected because DECamds and the Availability Manager share all the files cited. 2.2.3 Postinstallation Task: Editing Command File for Online Help The Netscape browser program on your system might not be in the directory specified in the AMDS$AM_SYSTEM:AMDS$AM_ LAUNCH_BROWSER.COM command file, which is part of the installation of the Availability Manager. If this is the case, you must edit this file to display online help. 7 To define the correct location of Netscape, edit the following line in the command file to reflect the location and name of the Netscape browser program on your system: $ Netscape:= $SYS$COMMON:[NETSCAPE.ALPHA]NETSCAPE-JAVA.EXE 2.3 Notes on Installing the Data Analyzer on Windows Systems The following notes pertain to the installation of the Availability Manager Data Analyzer on Windows NT and Windows 2000 systems. 2.3.1 Upgrading to Windows 2000 If you upgrade to Windows 2000 after installing the Availability Manager Version 1.4 on Windows NT 4.0, you must reinstall the Version 1.4 kit. When you reinstall, select the "Modify" option on the Windows Installation Welcome box. The reinstallation installs Windows 2000- compatible network drivers. 2.3.2 Running the Self-Extracting .EXE Multiple Times The Availability Manager software for Windows systems is packaged in a self-extracting executable (.EXE). On Alpha systems, if you run multiple installations of Availability Manager Version 1.4, the .EXE unpacks the installation in the same temporary folder. As a result of a duplicate installation, the system displays a message box entitled Overwrite Protection, which contains a message that "the following file is already installed on your system... Do you wish to overwrite the file?" You can ignore these messages. Click Yes to All. 2.3.3 Registry Subkey Message In some situations during an installation, the system displays the message "Registry Service Subkey already exists." You can ignore this message. 2.3.4 Self-Extracting Executable Does Not Exit In some situations, the self-extracting executable extracts the installation package but does not exit and start the installation. When this occurs, the system displays the "Unpacking" progress bar, and then nothing happens. Windows Task Manager shows the self-extracting executable as an active process, but it appears to be stalled. To activate the Availability Manager installation, press Ctrl + Alt + 8 Delete, and then choose Cancel. The InstallShield progress bar then appears, and the installation continues normally. 2.3.5 Problem with the Reboot Dialog Window on Intel Platforms if Another Window Is Open If you have any other window open (such as the Windows Explorer) during an installation, this window will be in front of the reboot dialog box at the end of the installation. Look for InstallShield Wizard in the task bar, and single-click it to bring the reboot window to the front. Note that you will also see this problem at the end of an uninstall operation. 2.3.6 Problem with the Shared Files Dialog Window on Intel Platforms During an Uninstall Operation If you have any other window open (such as the Windows Explorer) during an uninstall operation, the status box is moved to the back when the uninstall operation encounters a shared file to be removed. Look for InstallShield Wizard in the task bar, and single-click it to bring the message box about the shared file to the front. You can then click Yes to remove a shared file or No to keep the file. 3 Startup and Shutdown Notes The following notes pertain to starting up and shutting down the Availability Manager. 3.1 Avoid Using Multiple Data Analyzers on the Same System If the Availability Manager is shut down improperly or abruptly on a Windows system, the AM_SESSION.LOCK file might not be deleted, thereby preventing subsequent sessions from starting. In this situation, when you try to start the Data Analyzer, you will see the following warning: Could not establish session lock! Another AM session may be running. Either one of the following situations might exist: o Two sessions of the Availability Manager have overlapped, and the later session has detected the lock from the earlier session. Either use the earlier session, or shut down the old session before you start the new session. 9 o A Data Analyzer session terminated abnormally. If this occurs, follow these steps: 1. Delete the file AM_SESSION.LOCK in your installation directory. 2. Try to restart the Data Analyzer. 3. If the Data Analyzer fails to start, restart your system to clear any possible driver confusion. 3.2 Restarting After an Uninstall Operation on a Windows System To uninstall the Availability Manager on a Windows system using Add/Remove Programs on the Windows Control panel, follow these steps: 1. Uninstall the software. 2. Restart the system. This step completes the removal of the network bindings. 3. Optionally, reinstall the software. If you omit step 2, starting the Availability Manager could cause the system to fail. To recover from this situation, restart the system and then reinstall the Availability Manager (uninstalling the software again is not necessary). Finally, restart your system at the end of the installation. The Availability Manager should run properly. 4 Operation Notes Availability Manager operation notes fall into the following categories: o General information o Known problems 4.1 General Information The notes in this section contain information about the general operation of the Availability Manager. 10 4.1.1 Some DECamds Features Not Yet Implemented The Availability Manager is, in most respects, a Java implementation of the DECamds availability management software product. With each release, more features of DECamds are being added to the Availability Manager. However, not all features have yet been implemented in the Availability Manager. These features are planned to be added in future releases. 4.1.2 Data Collection and Events on OpenVMS Nodes Node summary data is the only data that is collected by default. The Availability Manager looks for events only in data that is being collected. You can collect additional data in either of the following ways: o Opening any display page that contains node-specific data (for example, CPU, memory, I/O) automatically starts foreground data collection and event analysis except for Lock Contention and Cluster Summary information (you must select these tabs individually to start foreground data collection.) Collection and evaluation continue as long as a page with node-specific data is displayed. Refer to the nodes chapter in the manual for details. o Clicking a check mark on the Customize OpenVMS... menu Data Collection page enables background collection of that type of data. Data is collected and events are analyzed continuously until you remove the check mark. Refer to the overview and customization chapters in the manual for details. 4.1.3 Limit Your Background Collection of Detailed Data By default, the only data collected on OpenVMS nodes is node summary data. You can collect this data on many nodes without incurring performance problems. If you do not have a high-performance workstation, and you have many nodes configured, be careful about enabling more data collection on the customization Data Collection page. This is especially true when you run the Data Analyzer on OpenVMS systems. 11 A new feature since Version 1.3 might help satisfy your data collection needs: when you open a node-specific data page, all types of data are automatically collected for that node. 4.1.4 Size of Event Log If you are collecting data on many nodes, running the Availability Manager for a long period of time can result in a large event log. For example, in a run that monitors more than 50 nodes with most of the background data collection enabled, the event log can grow by up to 30 MB per day. At this rate, systems with small disks might fill up the disk where the event log resides. Closing the Availability Manager application will enable you to access the event log for tasks such as archiving. Starting the Availability Manager starts a new event log. 4.2 Known Problems The notes in this section discuss known problems with Version 1.4 of the Availability Manager. 4.2.1 Windows NT Data Collector Does Not Recognize New Disk Configurations If you change the logical disk configuration on a running Windows NT node, the Data Collector does not recognize the modified disk configuration and continues to report the previous configuration to the Data Analyzer. For the Data Collector to recognize the new disk configuration, you must stop and restart the Data Collector (PerfServ). 4.2.2 Problem with Daylight Saving Time Changes For some time zones, especially European ones, the time- zone logic in the Java software libraries that the Data Analyzer uses might disagree with the Windows operating system about when the shift to daylight saving time occurs. For a two-week period in early April and late October, you might see a one-hour discrepancy between the time shown in the Data Analyzer and the time of day shown by the system and the Date-Time Control panel. 12 Also, Sun's Java classes disagree with Windows about whether daylight saving time even exists for Asian time zones. The Windows DateTime CP usually indicates that daylight saving time is not possible for these zones; time strings generated from the calendar classes in Java appear to recognize a daylight saving time shift. Therefore, for all time zones between eastern Europe, going east to Alaska, a one-hour discrepancy is likely from April through October. This discrepancy occurs for months at a time. For OpenVMS systems, make sure that the time zone differential logical name SYS$TIMEZONE_DIFFERENTIAL is defined correctly. 4.2.3 Occasional Application Failure for Data Analyzer on OpenVMS Systems If you are running the Data Analyzer on a multiprocessor OpenVMS system, you might encounter a "SIGBUS 10" application error. In this application error, your output window displays several hundred lines of low-level thread state. Compaq has seen this only when kernel multithreading was enabled for the process. Section 2.2.1 contains instructions for disabling kernel multithreading. Future patch kits for the kernel-threads subsystem on OpenVMS might solve this problem. Note that disabling kernel multithreading for the Data Analyzer does not disable application-level multithreading within the Java Virtual Machine or affect kernel multithreading for other applications on the OpenVMS system. 4.2.4 Event Reporting Problems The following list contains known event reporting problems that have been reported in Version 1.4: o Unimplemented threshold events: LOSTVC NOPROC o Event reporting irregularities: - Some posted events may not be canceled promptly when the condition goes away. - LOVOTE and LOVLSP events are posted for every node in the cluster rather than once per cluster. 13 4.2.5 Out-of-Memory Problems on Long Runs If a session runs for many days, and the Data Analyzer is collecting data on many nodes, the Data Analyzer might run out of virtual memory (object heap). (See the installation instructions for Windows or OpenVMS for details on modifying the heap size.) On Windows systems, the Data Analyzer does not report the problem. On OpenVMS systems, the Data Analyzer displays an "OutOfMemoryException" error in the window in which the Data Analyzer was started. On either system, one or more parts of the display might stop updating. The only workaround is to restart the Data Analyzer. 5 Display Notes The following notes pertain to the display of data on Availability Manager pages and have been organized under the following headings: o Problems Using the Data Analyzer on All Platforms o Problems Using the Data Analyzer on OpenVMS 5.1 Problems Using the Data Analyzer on All Platforms The problems discussed in this section apply to running the Data Analyzer on all platforms. 5.1.1 Hardware Model Sometimes Not Displayed on Node Summary Page For some long hardware model names, the Node Summary page hides most of the model name. On OpenVMS nodes, you can force the page to reveal the name by clicking the portion of the name that is visible and scrolling right. This problem will be resolved in the next release. 5.1.2 Problem Displaying Help in Some Browsers on Windows The following problems have been observed when using Version 4.7 of Netscape and some versions of Internet Explorer: o When you select Help, Windows presents a misleading error message, and the Help page is not shown. o When you select Help, the Help page is eventually displayed, but Windows presents a misleading error message anyway. 14 These problems have not been seen with Netscape Version 4.5; Compaq has not tested other versions of Netscape. 5.1.3 Incomplete Repainting of Windows If you obscure part of an Availability Manager window with another window, the obscured portion of the Availability Manager window might not repaint completely when you move the top window. This appears to be a Java Swing problem that is currently under investigation. 5.2 Problems Using the Data Analyzer on OpenVMS The problems discussed in this section apply to running the Data Analyzer on OpenVMS systems. 5.2.1 Problem Exiting Field on OpenVMS Data Collection Customization Page In customizing the OpenVMS Data Collection page on OpenVMS, if you change a data collection interval and press Enter to exit the field, the value is not entered as expected. You must use the mouse to move the cursor out of the field. 5.2.2 Long Runs Exhaust XLIB Resource ID The version of Motif currently shipping with OpenVMS is based on X11R5. That release of X11 uses a resource ID allocation scheme that works poorly with the Motif support in Java for OpenVMS. As a result, most long- running Availability Manager sessions will stop updating the display at a time that depends on the speed of the OpenVMS machine. For example, a session running on a dual- processor 275 MHz system reported the following after 14 hours: Xlib: resource ID allocation space exhausted! On faster machines, this message was reported after only 8 hours. This problem is under investigation. 15