Compaq Availability Manager Version 2.0 Release Notes The following notes address late-breaking information and known problems for the Availability Manager Version 2.0. These notes fall into the following categories: o Corrected problems, and new and changed features in Version 2.0 o Configuration, setup, and installation notes o Startup and shutdown notes o Operation notes o Display notes 1 Corrected Problems, and New and Changed Features in Version 2.0 The following sections discuss key problems that have been corrected in Version 2.0. Changes and new features in this release are also described. 1.1 Data Analyzer Problems Corrected Since Version 1.4 This section discusses key Data Analyzer problems that have been corrected in Version 2.0. 1.1.1 Availability Manager Version 2.0 on OpenVMS Does Not Require Java to be Installed The Availability Manager Version 2.0 Data Analyzer application now packages its own Java run-time environment and is independent of any other version of Java that might be installed on the system. (The Availability Manager on Windows NT and Windows 2000 systems also has its own Java run-time environments.) 1.1.2 Significantly Higher Threshold Limits Are Now Available The previous threshold limits were too small for today's huge CPU and memory configurations. All such limits have been increased to very large values. For more detailed information, see Section 4.1.1. 1 1.1.3 Problems on Windows When More Than One Network Card Is Installed on a System The Availability Manager Version 2.0 Windows installation now detects when a system has more than one network card and asks the user to select one for Availability Manager LAN communication. To change your choice at a later time to use a different network card, you must reinstall Availability Manager Version 2.0. This feature applies to both Windows NT and Windows 2000 systems. 1.1.4 Availability Manager Session Lock Error Prior to Version 2.0, if the Availability Manager application terminated improperly, the AM_Session.lock data file remained on the system. On subsequent attempts to start the application, this file caused the Availability Manager to behave as though another session was already running. A new mechanism now detects this situation without using the data file, and the confusion no longer occurs. 1.1.5 Problems with Disk Data Filtering Several problems filtering disk data because of incorrect volume names and disk status data have been corrected in Availability Manager Version 2.0. 1.1.6 Interoperability with WRQ's Reflection Product An interoperability problem with WRQ's Reflection product has been corrected in Reflection Version 9.0. 1.1.7 Problems Displaying Online Help Several problems using different browsers to display the HTML online documentation were reported for previous versions of the Availability Manager. Version 2.0 includes a built-in browser to display online documentation and therefore is no longer dependent on whatever browser is available on a particular system. 1.2 Data Collector Problems Corrected Since Version 1.4 This section discusses key Data Collector (RMDRIVER) problems that have been corrected in Version 2.0. You must install the new kit on each OpenVMS Data Collector node. 2 1.2.1 System Failures on OpenVMS Nodes That Are Being Monitored Earlier versions of the OpenVMS Data Collector (RMDRIVER) could infrequently cause system failures when they received certain types of truncated network packets. Transient network problems and software bugs sometimes truncated these packets, and the safeguards in RMDRIVER to detect this problem proved to be inadequate. Availability Manager Version 1.4 and later OpenVMS Data Collectors contain more safeguards to prevent the system from failing under these circumstances. 1.3 Changes and New Features in Version 2.0 This section discusses the changes and new features in Version 2.0. 1.3.1 Group Status at a Glance Previous versions of Availability Manager required you to select a group and display the nodes in that group to be able to detect a problem. A new color scheme shows, at a glance, the status of all nodes in a group, whether or not that group is selected. 1.3.2 New Internal Infrastructure to More Easily and Quickly Support New Operating System Features New support has been added to the OpenVMS Data Collector, RMDRIVER, for OpenVMS managed objects, operating system components with characteristics that allow the Availability Manager to manage them. Managed objects, which register themselves with the Data Collector at system startup, not only provide data but also implement fixes in response to client requests. In OpenVMS Version 7.3, cluster data and fixes are available for LAN virtual circuits through the managed object interface. When the Availability Manager Version 2.0 Data Analyzer connects to a Data Collector node, it retrieves a list of the managed objects on that node, if any. For such a node, the Availability Manager can provide additional details and new data that would otherwise be unavailable. 3 Managed objects are currently available only on OpenVMS Version 7.3. Before a Data Collector node can make managed object data available to Data Analyzers, the system manager must take steps so that the Data Collector driver, RMDRIVER, is loaded early in the boot process. See the postinstallation steps in the Availability Manager installation instructions for OpenVMS systems for more details on how to enable collection of managed object data. 1.3.3 Preliminary Wildfire/Galaxy Support When monitoring OpenVMS Alpha Version 7.3 nodes, Availability Manager Version 2.0 provides new information to support NUMA or OpenVMS Resource Affinity Domains (RADs). This information includes the following: o New Memory Details page for OpenVMS Alpha Version 7.3 nodes o New Single Process Memory page for processes running on OpenVMS Alpha Version 7.3 nodes o CPU modes show the RAD for a CPU o CPU process list shows the home RAD for each process o Node Summary page shows the number of RADs configured, the system serial number, and the Galaxy ID of a node, if any 1.3.4 New Switched LAN Displays When you monitor OpenVMS Version 7.3 nodes with managed objects enabled, additional cluster data and fixes are available for LAN virtual circuits. This includes enhanced LAN virtual circuit summary data in the Cluster Summary window and the LAN Virtual Circuit Details (NISCA) window. In addition, the Cluster Summary now includes virtual circuit, channel and adapter fixes. If managed object support is not enabled for a Data Collector node, then only basic virtual circuit data is available. 4 1.3.5 New User-Defined Event Notifications You can now enable user event notifications, similar to those available in DECamds, on both Windows and OpenVMS platforms. Users can specify scripts to be executed when events occur, and these scripts can perform actions such as sending e-mail or phoning a pager. 1.3.6 Built-in Browser for Display of Online Documentation The display of online documentation is now both faster and no longer dependent on system browsers with their potential incompatibilities. The Availability Manager Version 2.0 GUI contains its own HTML browser. 1.3.7 Built-in Java Run-time Environment on OpenVMS Prior releases of the Availability Manager required you to install specific versions of Java on your system. Availability Manager Version 2.0 does not depend on any installed version of Java; instead, it includes its own, private Java run-time environment in the kit. 1.3.8 ODS-5 File System Support It is now possible to display page and swap files on ODS- 5 disks, including files with very long file names and special characters. 1.3.9 Support for the New PGFLQUOTA Process-Level Fix A new process-level fix permits you to dynamically adjust the target process PGFLQUOTA. This operates in much the same way as other process limit fixes do. 1.3.10 Simpler Mechanism for Site-Specific Configuration Setup OpenVMS kits for DECamds Version 7.3A and Availability Manager Version 2.0 provide a template file that system managers can modify to define the logical names used by the Data Collector. The file, SYS$MANAGER:AMDS$SYSTARTUP.TEMPLATE, can be copied to SYS$MANAGER:AMDS$SYSTARTUP.COM and edited to change the default logicals that are used to start the Data Collector and to locate its configuration files. The two most common logicals, especially in a mixed- environment cluster configuration, are the following: o AMDS$GROUP_NAME- specifies the group that this node will be associated with when it is monitored. 5 o AMDS$CONFIG- specifies the location of the security file used by the Data Collector. 1.4 Additional Information This section contains additional information about the Availability Manager. 1.4.1 Recognizing a System Failure Forced by the Availability Manager Because a user with suitable privileges can force a node to fail from the Data Analyzer by using the "Crash Node" fix, system managers have requested a method for recognizing these particular failure footprints so that they can distinguish them from other failures. These failures all have identical footprints: they are operator-induced system failures in kernel mode at IPL 8. The top of the kernel stack is similar the following display: SP => Quadword system address Quadword data 1BE0DEAD.00000000 00000000.00000000 Quadword data TRAP$CRASH Quadword data SYS$RMDRIVER + offset 2 Configuration, Setup, and Installation Notes The following notes pertain to configuring, setting up, and installing the Availability Manager. 2.1 Recommended Hardware Configurations There are no minimum hardware requirements for the Data Collector. Compaq recommends using, at a minimum, one of the following hardware configurations on systems running the Data Analyzer: ___________________________________________________________ System______________Hardware_______________________________ Windows NT/Windows 300 MHz Intel Pentium processor with 96 2000 MB of memory 6 ___________________________________________________________ System______________Hardware_______________________________ OpenVMS 500 MHz Alpha processor with 128 MB of ____________________memory_________________________________ 2.2 Changing Your Shortcut to Invoke Availability Manager If you use your own shortcut to invoke the Availability Manager, you need to change it to reflect a change in the command line for Version 2.0 of the product. To make this change, follow these instructions for either Windows NT or Windows 2000: 1. Depending on your system, perform one of the following: o Windows NT: Use Windows Explorer to locate the shortcut you created, right-click it, and select Properties. o Windows 2000: Right-click the shortcut you created, and select Properties. 2. Click the Shortcut tab. 3. Use the arrow keys to scroll the Target line. After Swingall.jar, add ;AvailDocs.jar (type a semicolon before the file name). 4. Click OK at the bottom of the page. 2.3 Notes on Installing the Data Analyzer on OpenVMS Systems The following notes pertain to the installation of the Availability Manager Data Analyzer on OpenVMS systems. 2.3.1 Enabling and Disabling Kernel Multithreading On multiple-CPU OpenVMS systems, the logical name AMDS$AM_MULTITHREADING controls whether or not the Availability Manager runs on multiple CPUs (that is, whether it uses kernel multithreading). This logical name is defined in the SYS$MANAGER:AMDS$LOGICALS.COM file. Setting AMDS$AM_MULTITHREADING to TRUE can improve application performance, but at the cost of application stability. Setting the logical name to FALSE (the default) forces the application to run on a single CPU. For the 7 current set of patches available on OpenVMS, this approach offers the greatest stability. Enabling and Disabling Commands To enable kernel multithreading, set the logical to TRUE: $ AMDS$DEF AMDS$AM_MULTITHREADING TRUE To disable kernel multithreading, set the logical to FALSE: $ AMDS$DEF AMDS$AM_MULTITHREADING FALSE 2.3.2 PCSI Installation Messages If you install DECamds Version 7.3A after installing Availability Manager Version 2.0, you might see any of the following PCSI messages: %PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.EXE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$LDR]SYS$RMDRIVER.STB was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.COM was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYS$STARTUP]AMDS$STARTUP.TEMPLATE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSEXE]AMDS$RMCP.EXE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, module AVAIL was not replaced because module from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.DAT was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$DRIVER_ACCESS.TEMPLATE was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.COM was not replaced because file from kit does not have higher generation number %PCSI-I-RETAIN, file [SYSMGR]AMDS$LOGICALS.TEMPLATE was not replaced because file from kit does not have higher generation number These messages are to be expected because DECamds and the Availability Manager share all the files cited. 8 2.4 Notes on Installing the Data Analyzer on Windows Systems The following notes pertain to the installation of the Availability Manager Data Analyzer on Windows NT and Windows 2000 systems. 2.4.1 Upgrading to Windows 2000 If you upgrade to Windows 2000 after installing the Availability Manager Version 2.0 on Windows NT 4.0, you must reinstall the Version 2.0 kit. When you reinstall, select the "Modify" option on the Windows Installation Welcome box. The reinstallation installs Windows 2000- compatible network drivers. 2.4.2 Running the Self-Extracting .EXE Multiple Times The Availability Manager software for Windows systems is packaged in a self-extracting executable (.EXE). If you run multiple installations of Availability Manager Version 2.0, the .EXE unpacks the installation in the same temporary folder. As a result of a duplicate installation, the system displays a message box entitled Overwrite Protection, which contains a message that "the following file is already installed on your system... Do you wish to overwrite the file?" You can ignore these messages. Click Yes to All. 2.4.3 Unable to Enter Standby or Hibernate States, or to Disable a Network Card on Windows 2000 If you install Availability Manager Version 2.0 software on a Windows 2000 system such as a laptop, you will not be able to place the system in standby or hibernate states, or disable the network adapter that DECNDIS is bound to. Compaq is working to correct these problems in a future release. 2.4.4 Registry Subkey Message In some situations during an installation, the system displays the message "Registry Service Subkey already exists." You can ignore this message. 9 2.4.5 Self-Extracting Executable Does Not Exit In some situations, the self-extracting executable extracts the installation package but does not exit and start the installation. When this occurs, the system displays the "Unpacking" progress bar, and then nothing happens. Windows Task Manager shows the self-extracting executable as an active process, but it appears to be stalled. To activate the Availability Manager installation, press Ctrl + Alt + Delete, and then choose Cancel. The InstallShield progress bar then appears, and the installation continues normally. 2.4.6 Problem with the Reboot Dialog Window on Intel Platforms if Another Window Is Open If you have any other window open (such as the Windows Explorer) during an installation, this window will be in front of the reboot dialog box at the end of the installation. Look for InstallShield Wizard in the task bar, and single-click it to bring the reboot window to the front. Note that you will also see this problem at the end of an uninstall operation. 2.4.7 Problem with the Shared Files Dialog Window on Intel Platforms During an Uninstall Operation If you have any other window open (such as the Windows Explorer) during an uninstall operation, the status box is moved to the back when the uninstall operation encounters a shared file to be removed. Look for InstallShield Wizard in the task bar, and single-click it to bring the message box about the shared file to the front. You can then click Yes to remove a shared file or No to keep the file. 3 Startup and Shutdown Notes The following notes pertain to starting up and shutting down the Availability Manager. 3.1 Restarting After an Uninstall Operation on a Windows System To uninstall the Availability Manager on a Windows system using Add/Remove Programs on the Windows Control panel, follow these steps: 1. Uninstall the software. 2. Restart the system. 10 This step completes the removal of the network bindings. 3. Optionally, reinstall the software. If you omit step 2, starting the Availability Manager could cause the system to fail. To recover from this situation, restart the system and then reinstall the Availability Manager (uninstalling the software again is not necessary). Finally, restart your system at the end of the installation. The Availability Manager should run properly. 4 Operation Notes Availability Manager operation notes fall into the following categories: o General information o Known problems 4.1 General Information The notes in this section contain information about the general operation of the Availability Manager. 4.1.1 Higher Event Thresholds Set To accommodate the needs of today's high-end systems under varying workloads and to allow for future growth, Compaq has increased the maximum settable values for the event thresholds of the following OpenVMS events to 1,073,741,823: DSKQLN HIBIOR HIDIOR LOASTQ LOBIOQ LOBYTQ LODIOQ LOENQU LOFILQ LOPGFQ LOPGSP LOSWSP LOTQEQ LOVLSP LOWEXT LOWSQU PRBIOR PRDIOR 4.1.2 Data Collection and Events on OpenVMS Nodes Node summary data is the only data that is collected by default. The Availability Manager looks for events only in data that is being collected. You can collect additional data in either of the following ways: o Opening any display page that contains node-specific data (for example, CPU, memory, I/O) automatically 11 starts foreground data collection and event analysis except for Lock Contention and Cluster Summary information (you must select these tabs individually to start foreground data collection.) Collection and evaluation continue as long as a page with node-specific data is displayed. Refer to the nodes chapter in the manual for details. o Clicking a check mark on the Customize OpenVMS... menu Data Collection page enables background collection of that type of data. Data is collected and events are analyzed continuously until you remove the check mark. Refer to the overview and customization chapters in the manual for details. 4.1.3 Limit Your Background Collection of Detailed Data By default, the only data collected on OpenVMS nodes is node summary data. You can collect this data on many nodes without incurring performance problems. If you do not have a high-performance workstation, and you have many nodes configured, be careful about enabling more data collection on the customization Data Collection page. This is especially true when you run the Data Analyzer on OpenVMS systems. A new feature since Version 1.3 might help satisfy your data collection needs: when you open a node-specific data page, all types of data are automatically collected for that node. 4.1.4 Size of Event Log If you are collecting data on many nodes, running the Availability Manager for a long period of time can result in a large event log. For example, in a run that monitors more than 50 nodes with most of the background data collection enabled, the event log can grow by up to 30 MB per day. At this rate, systems with small disks might fill up the disk where the event log resides. Closing the Availability Manager application will enable you to access the event log for tasks such as archiving. Starting the Availability Manager starts a new event log. 12 4.2 Known Problems The notes in this section discuss known problems with Version 2.0 of the Availability Manager. 4.2.1 Single-Disk Free Block Count Incorrect When Mounted Clusterwide Although correct for disks that are not mounted clusterwide, the free block count for cluster disks is incorrect in the single disk display. This display shows the disk from the perspective of each node in the cluster with that disk mounted. 4.2.2 Problem with Daylight Saving Time Changes For some time zones, especially European ones, the time- zone logic in the Java software libraries that the Data Analyzer uses might disagree with the Windows operating system about when the shift to daylight saving time occurs. For a two-week period in early April and late October, you might see a one-hour discrepancy between the time shown in the Data Analyzer and the time of day shown by the system and the Date-Time Control panel. Also, Sun's Java classes disagree with Windows about whether daylight saving time even exists for Asian time zones. The Windows DateTime CP usually indicates that daylight saving time is not possible for these zones; time strings generated from the calendar classes in Java appear to recognize a daylight saving time shift. Therefore, for all time zones between eastern Europe, going east to Alaska, a one-hour discrepancy is likely from April through October. This discrepancy occurs for months at a time. For OpenVMS systems, make sure that the time zone differential logical name SYS$TIMEZONE_DIFFERENTIAL is defined correctly. 4.2.3 Event Reporting Problems The following list contains known event reporting problems that have been reported in Version 2.0: o Unimplemented threshold events: LOSTVC NOPROC o Event reporting irregularities: - Some posted events may not be canceled promptly when the condition goes away. 13 - LOVOTE and LOVLSP events are posted for every node in the cluster rather than once per cluster. 4.2.4 Out-of-Memory Problems on Long Runs If a session runs for many days, and the Data Analyzer is collecting data on many nodes, the Data Analyzer might run out of virtual memory (object heap). (See the Availability Manager installation instructions for Windows or OpenVMS for details on how to modify the heap size.) On Windows systems, the Data Analyzer does not report the problem. On OpenVMS systems, the Data Analyzer displays an "OutOfMemoryException" error in the window in which the Data Analyzer was started. On either system, one or more parts of the display might stop updating. The only workaround is to restart the Data Analyzer. 4.2.5 User with Inadequate Page File Quota Cannot Run OpenVMS Data Analyzer If a user with inadequate page file quota (PGFLQUOTA) tries to run the Availability Manager Data Analyzer on OpenVMS, an error message is displayed and the application stops. Inadequate PGFLQUOTA causes unusual behavior in the OpenVMS Java Virtual Machine, preventing the Availability Manager from starting and running normally. Please refer to the OpenVMS Installation Instructions for the appropriate PGFLQUOTA settings. 4.2.6 Data Analyzer Might Not Recognize Impromptu Operating System Upgrades If the Availability Manager Data Analyzer is monitoring an OpenVMS node that is shut down and then restarted with a different version of the operating system, the Data Analyzer does not recognize the change. Displays for this node continue to show the previous operating system version, and data collection for this node might also be affected. 14 5 Display Notes The following notes pertain to the display of data on Availability Manager pages and have been organized under the following headings: o Problems Using the Data Analyzer on All Platforms o Problems Using the Data Analyzer on OpenVMS Systems 5.1 Problems Using the Data Analyzer on All Platforms The problems discussed in this section apply to running the Data Analyzer on all platforms. 5.1.1 What to Do If a Node is Displayed Twice A node can be displayed twice in the Node pane when the Data Collector (RMDRIVER) is started before the network transports are started. To avoid this problem, always start your network transports (DECnet) before starting the Availability Manager Data Collector. 5.1.2 Incomplete Repainting of Windows If you obscure part of an Availability Manager window with another window, the obscured portion of the Availability Manager window might not repaint completely when you move the top window. This appears to be a Java Swing problem that is currently under investigation. 5.1.3 Page and Swap File Names in Event List Display If page and swap file events are signaled before the Data Analyzer has resolved their file names from the file ID (FID), events such as LOPGSP display the FID instead of the file name information. You can determine the file name for the FID by checking the File Name field in the I/O Page Swap Files page. The FID for the file name is displayed after the file name. 15 5.1.4 Events Are Sometimes Displayed After Background Collection Stops On both OpenVMS and Windows systems, the Data Analyzer sometimes displays events after users customize their systems to stop collecting a particular kind of data. This is most likely to occur when the Data Analyzer is monitoring many nodes. Under these conditions, a data handler sometimes clears events before all pending packets have been processed. The events based on the data in these packets are displayed even though users have requested that this data not be collected. 5.1.5 Truncated LAN Path (Channel) Summary Display The LAN Path (Channel) Summary display might be disabled for some OpenVMS nodes if there are more than seven channels for that virtual circuit. This problem results from a restriction in the OpenVMS Version 7.3 PEDRIVER. When this restriction is removed (in remedial releases or later versions of OpenVMS), the full channel summary will be displayed. For this condition, the following error message is displayed: Error retrieving ChSumLAN data, error code=0x85 (Continuation data disallowed for request) 5.2 Problems Using the Data Analyzer on OpenVMS Systems The problems discussed in this section apply to running the Data Analyzer on OpenVMS systems. 5.2.1 Problem Exiting Field on OpenVMS Data Collection Customization Page In customizing the OpenVMS Data Collection page on OpenVMS, if you change a data collection interval and press Enter to exit the field, the value is not entered as expected. You must use the mouse to move the cursor out of the field. 5.2.2 Long Runs Exhaust XLIB Resource ID The version of Motif currently shipping with OpenVMS is based on X11R5. That release of X11 uses a resource ID allocation scheme that works poorly with the Motif support in Java for OpenVMS. As a result, most long- running Availability Manager sessions will stop updating the display at a time that depends on the speed of the 16 OpenVMS machine. For example, a session running on a dual- processor 275 MHz system reported the following after 14 hours: Xlib: resource ID allocation space exhausted! On faster machines, this message was reported after only 8 hours. This problem is under investigation. 17