Benchmark Descriptions

Benchmark Descriptions

DIGITAL Servers for Windows NT™ Family

This document is a companion piece to the DIGITAL Servers for Windows NT Performance Flash. It provides summary descriptions (written by the sponsoring organization) and significance of the industry- or de facto-standard benchmark results for the following tests:

Application: Exchange LoadSim and Lotus NotesBench

Compute & Throughput: AIM Server for NT (Domain Server Mix, File Server Mix) and AIM Suite VII for UNIX; Ziff-Davis (NetBench, ServerBench, and WebBench); and the SPEC Suite.

Database: Transaction Processing Council (TPC-C)

Metrics: benchmarks can be used for measurement in various ways:

Speed measurement – how fast does this test run, based only on speed of computation?

Throughput measurement – how does the system do on an overall test score, considering a representative mix of computational and I/O tasks involved in this application?

Some benchmarks determine a "high water mark" or maximum workload supported by this system for a particular application or test

Each of the benchmarks offers different metrics to capture a quantified measure of processor compute speed and/or maximum workload, as opposed to a quantification of the system throughput.

For each test, we offer a summary description of the test (extracted from the website of the sponsoring vendor or consortium), along with commentary re: DIGITAL’s experience and differentiation. URL hyperlinks (for Word97) are provided to each sponsoring organization’s web page, where detailed information is available.

Application Benchmarks

Microsoft Exchange (LoadSim) http://www.backoffice.microsoft.com/downtrial/moreinfo/loadsimulator.asp

Vendor Summary: Microsoft Exchange Load Simulator (LoadSim) Version 5.5 is a multi-client messaging emulation tool for the MAPI protocol. It is used to test the performance of Microsoft Exchange Server under varying message loads. Specific workloads can be defined in Load Simulator to exercise servers in a controlled manner. This information can then be used to help determine the maximum and optimum number of users per server, identify performance bottlenecks and evaluate server hardware performance.

This test measures the overall performance of Exchange Server when used as a messaging platform. Typical client actions that are simulated by LoadSim include creating, reading, deleting, and replying to messages of varying sizes. The test involves applying a standard load and measuring response time. This test used the standard LoadSim "medium user" profile, which is similar to the typical business user. A valid LoadSim test is recorded when the 95 percent of the LoadSim test cycles had a response time of less than one second and there were no leftover work items in the Exchange Server queues.

Metrics: The Exchange LoadSim tests yield metrics for maximum number of users supported (LoadSim Exchange Users) under different types of workload mixes.

DIGITAL Commentary: The LoadSim benchmark is a comprehensive test, measuring overall throughput of the system. The benchmark simulates a certain amount of mail load (quantity and size of mail) over a certain period to be delivered ("send queue" emptied) to another server within a certain time frame. By using different LoadSim settings, you may be able to increase the number of users supported, but actual delivery of that mail may be allowed to take more time. To perform these tests, DIGITAL used the medium workload setting.

The DIGITAL Server Models 7105 and 7310 offer the best performance and scalability for enterprise Exchange installations in the industry. Where server consolidation is a customer goal, the additional scalability of the Alpha-based 7310 will contribute to lower cost of ownership. The Intel-based DIGITAL Server 7105 beats competitors’ offerings on price/performance, based on a comparison of results that are publicly available. Independent consultants have written about the choice of Alpha for Exchange; please refer to the following reports for additional information. http://www.digital.com/messaging

"Rightsizing Microsoft Exchange: When to Choose an ALPHA Solution" (Sept 97) Creative Networks, Inc.

"Bringing NT and Exchange to the Enterprise---Digital Takes a Commanding Lead", Aberdeen Group (Nov 1997)

An interactive Cost of Ownership calculation tool for Microsoft Exchange developed by Creative Networks, Inc. can be downloaded from http://www.partner.digital.com/sbu/9markets/mailmessage/salestools/serveredge/cooserver1.html

II. Lotus Domino NotesBench http://www.notesbench.org/

Vendor Summary: NotesBench is a tool developed by Lotus Development Corporation that enables hardware vendors and distribution partners to directly provide Lotus Notes customers with relative capacity and performance information on various platforms and configurations. NotesBench is a performance characterization tool that simulates client load using remote emulators, executing transactions against the server under test. It is a closed tool; test results can be disclosed only after a successful audit by an independent company. In this way, each vendor's result can be directly compared.

There are five test suites that comprise NotesBench; however, the most commonly measured suite is the active mail users test. In this workload, users perform mail and simple shared database operations, and the reporting metric is the maximum number of users that can be supported before response time becomes unacceptable.

Metrics: Results are provided in number of users supported, and the throughput measure is NotesMarks.

DIGITAL Commentary: DIGITAL continues to hold the high watermark in the battle for Windows NT NotesBench supremacy, achieving record results in the number of concurrent mail users supported. DIGITAL tested a range of Intel- and Alpha-based servers. At 6,000 Mail users, a quad processor DIGITAL Server 7305R (533 Mhz Alpha processors) exceeded all NotesBench results to date on any Windows NT Server! The DIGITAL Server 5305 (Alpha-based dual processor) supports 4,000 Mail users. Both results surpassed their nearest competitor by almost 17%! The DIGITAL Server 3305 (Alpha-based single processor) supports 2,000 Mail users. On the Intel side, the DIGITAL Server 7105 (four-processor Intel Pentium Pro) delivers 5,160 Mail users, while a DIGITAL Server 3200 (single processor Pentium Pro) delivers 1,950 Mail users.

DIGITAL has tested its servers using three Notesbench workloads - Mail Only, MailDB, and Groupware -roughly corresponding to Light, Medium, and Heavy usage. For more information on DIGITAL Servers with Lotus Domino solutions, see the FAQs at URL http://www.digital.com/messaging/lotus/lotusfaq.html

Based upon audited results available to date, DIGITAL Servers offer the best scalability and server consolidation benefits in the industry, as documented in the following Seybold Group paper.

"DIGITAL Servers in a Notes/Domino Environment: Meeting Enterprise CustomerNeeds" Patricia Seybold Group (Sept.1997)

Compute & Throughput Benchmarks

AIM Benchmarks http://www.aim.com/

The AIM Server Benchmark for Windows NT http://www.aim.com/NT_server.html

Sponsor Summary: AIM Technology’s Server Benchmark for Windows NT is a system-level WIN32-compliant Benchmark for the Microsoft Windows NT operating system. This benchmark utilizes AIM’s proven load-mix modeling technology in a multi-threading and multi-processing environment. It is designed to test overall system performance of standard Windows NT Server configurations on Alpha and Intel platforms.

AIM Technology uses Load/Mix Modeling to test how well servers perform under different application loads. The role of Load/Mix modeling is to allow AIM to apply any type of load to a system running the Windows NT operating system. The benchmark includes a pre-defined set of application mixes to model the most general uses of server systems. Two initial application mixes for the Server Benchmark are: Domain Server Mix and File Server Mix.

Domain Server Mix v2.0 /Windows NT

The AIM Domain Server Mix/Windows NT is composed of 50 different tests from all subsystem categories. The Domain Server Mix represents a balanced usage of subsystems that are configured as a typical enterprise shared server. The major tasks performed by the typical domain server include light file transfers, network routing and packet forwarding, email, shared applications such as spreadsheets and word processors, and network maintenance.

File Server Mix v2.0 /Windows NT

The AIM File Server Mix/Windows NT is composed of 37 different tests from all major subsystem categories. The File Server Mix represents a balanced usage of subsystems that are configured as a gateway file server. The major tasks performed by these file servers include file transfers of various sizes (both synchronous and asynchronous), network routing and packet forwarding, system security and access permission checking, heavy memory usage and IPC calls.

The AIM MultiUser Suite VII for UNIX Servers

Multiuser systems are used for a wide variety of reasons. The AIM multiuser benchmark was designed to test the performance of systems ranging from compute servers to file servers, as well as multiuser systems that are used primarily to maintain databases. This benchmark runs on the most advanced systems and tests features required by modern Open Systems multiuser environments. The benchmark includes two standard mixes of tests that cover the "standard" uses of large computer systems. If the system is heavily used as a shared application server or a file server, you can use one of the standard mixes to test the system. The standard mixes for the multiuser benchmark are the Multiuser/Shared System and File Server.

The Multiuser/Shared Application Server Mix models a multiuser environment emphasizing office automation; word processing, spreadsheet, email, database, payroll, and data processing. This mix represents a broad use of different applications, as opposed to a great deal of emphasis on one type of application. This mix of tests models the wide variety of operations that are commonly found on shared multiuser systems. The mix includes substantial testing of calculations, file system interaction, shell operations and executes. There is also some emphasis placed on Interprocess Communications (IPC).

The File Server Mix models many integer compute file system operations in heavy concentration. This mix helps users measure the machine's I/O capabilities. Some emphasis is placed upon non-I/O issues including integer calculations, data searches and system interactions. All tests are run locally on the system and do not require a network connection.

DIGITAL Commentary: The performance of the DIGITAL Server line on the AIM benchmarks is well established by the AIM Hot Iron Awards (See URL: http://www.aim.com/pm_awards.html).

Digital won 10 Awards in April 1998, more than any other vendor, continuing its performance sweep.

DIGITAL runs the MultiUser Suite VII tests under the SCO UnixWare 2.1.1 operating system on Intel-based DIGITAL Servers.

II. Ziff-Davis Inc. Benchmarks http://www1.zdnet.com/zdbop/zdbop2.html

Sponsor Summary:

NetBench 5.01 -- measures the performance of a file server by measuring how well it handles file I/O requests from as many as four different client types: DOS, 32-bit Windows, 16-bit Windows, and/or Mac OS systems. The clients pelt the server with requests for network file operations. Each client tallies how many bytes of data it moves to and from the server and how long the process takes. The client uses this information to calculate its throughput for that test mix. NetBench adds all the client throughputs together to produce the overall throughput for a server. Latest release: 4/21/97.

http://www1.zdnet.com/zdbop/netbench/netbench.html

ServerBench 4.01 -- measures the performance of application servers in a client/server environment by running tests that produce different types of load on the server. The ServerBench test environment includes the server you're testing, its PC clients, and a PC designated as the controller (you execute

and monitor test suites from the controller). The clients and the controller must run either Windows 95 or Windows NT. The server may run any one of a number of operating systems. Latest release: 12/19/97.

http://www1.zdnet.com/zdbop/svrbench/svrbench.html

WebBench ™ 2.0 -- measures the performance of Web server software by returning two overall server results: the total requests per second the Web server handled for all the clients in the test and the server throughput (in bytes per second). WebBench provides both static standard test suites and dynamic standard test suites. The static test suites access only HTML, GIF, and a few sample executable

files. They do not run any programs on the server. The dynamic test suites execute applications that actually run on the server. They use CGI applications created for several server platforms.

http://www1.zdnet.com/zdbop/webbench/webbench.html

The following server evaluation is from the November 1997 Personal Computer Magazine, published by Ziff-Davis: Pentium Pro and Pentium II Server Review: DIGITAL Servers (formerly Prioris) shine across the board! http://www.zdnet.com/products/content/pccg/1011/pccg0102.html

NetBench: The NetBench test showcases a server's ability to traffic files from different types and numbers of clients. In other words, it measures I/O throughput. The Digital Prioris delivered speedy performance even when more than 32 clients were making requests. The other servers tended to lose steam under heavy loads, with the Polywell Poly 2X266TD2 bringing up the rear.

WebBench: If your Web site is underperforming at bringing in revenues, it's quite possible that your server is just plain underperforming. In Web server benchmarks, Pentium Pro systems like the SAG STF QuadPro strained when more than 32 requests were made. The Digital Prioris shone across the board, even when handling multiple requests.

ServerBench: Forget I/O for a minute and ask yourself how fast a server delivers applications. The Xi NetRAIDer soared with up to 4 clients attached, but stumbled with 60.The 266MHz Pentiums outpaced the 200MHz SAG STF QuadPro when loads were low, but SAG's server pulled ahead when network traffic resembled the Santa Monica Freeway at rush hour. The Digital Prioris HX 6266 was the only system that held up nicely across the board.

DIGITAL Commentary: We appreciate ZD’s complimentary evaluation of the Intel-based DIGITAL Server (formerly Prioris) family, and would note in addition only that since this lab evaluation was conducted, that DIGITAL has introduced two upgrades of its Intel-based DIGITAL Server models.

II. The Standard Performance Evaluation Corporation (SPEC Suite) http://www.specbench.org/

SPEC CPU95: Metrics include SPECint95, SPECfp95, SPECint_base95, SPECint_rate95, etc. for Integer and Floating Point compute speed, and SPECrates for throughput. The benchmark was announced in August '95.

Sponsor Summary: SPEC is a non-profit corporation formed to establish and maintain computer benchmarks for measuring component (C) and system-level (S) computer performance. SPEC95 is a software benchmark product produced by SPEC. It was designed to provide comparable measures of performance for comparing compute-intensive workloads on different computer systems. SPEC95 contains two suites of benchmarks:

CINT95: for measuring/comparing compute-intensive integer performance (commercial applications).

CFP95: for measuring/comparing compute-intensive floating point performance (scientific/numeric).

Being compute-intensive benchmarks, these benchmarks emphasize the performance of the computer's processor, the memory architecture and the compiler. It is important to remember the contribution of the latter two components; performance is more than just the processor. The CINT95 and CFP95 benchmarks do not stress other computer components such as I/O (disk drives), networking or graphics, as the percentage of time spent in operating system and I/O functions is generally negligible. Note that it may be possible to configure a system in such a way that one or more of these components impact the performance of CINT95 and CFP95. However, that is not the intent of the suites.

‘Peak’ (optimized) vs. 'Baseline' (conservative) measurements

In 1994, the SPEC Open Systems Steering Committee decided to introduce "baseline results." The results (for both speed and throughput measurements) have to be measured with more restrictive run rules, regulating the use of compiler/linker optimization options ("flags"). As a general guideline, a system vendor is expected to endorse the general use of the baseline options by customers who seek to achieve good application performance.

The intention is that baseline results represent the performance a not- so-sophisticated user would achieve, whereas the traditional "peak" rules allow a selection of optimization flags that is more typical for sophisticated users. When SPEC's CPU benchmark results are reported, the reports must include baseline results. Baseline-only reporting is allowed. A test sponsor is free to mention only peak results in marketing literature, but baseline results must be available and provided upon request.

The base metrics (i.e., "SPECint_base95") are required for all reported results and have set guidelines for compilation (i.e., the same flags must be used in the same order for all benchmarks). The non-base metrics (i.e., "SPECint95") are optional and have less strict requirement (i.e., different compiler options may be used on each benchmark.

1. Speed Measurement

There are several different ways to measure computer performance. One way is to measure how fast the computer completes a single task; this is a speed measure. The SPEC speed metrics (i.e., SPECint95) are used for comparing the ability of a computer to complete single tasks. The results ("SPEC Ratio" for each individual benchmark) are expressed as the ratio of the wall clock time to execute one single copy of the benchmark, compared to a fixed "SPEC reference time". For the CPU95 benchmarks, a Sun SPARCstation 10/40 was chosen as the reference machine.

The following metrics ("weighted averages" which are geometric means of 8-10 individual tests) have been defined for speed measurements with the CPU95 benchmarks:

SPECint_base95

SPECfp_base95

SPECint95

SPECfp95

2. Throughput (Rate) Measurement

Another way to measure performance is to determine how many tasks a computer can accomplish in a certain amount of time; this is called a throughput, capacity or rate measure. The SPEC rate measures (i.e., SPECint_rate95) the throughput or rate of a machine carrying out a number of tasks. The results express how many jobs of a particular type (characterized by the individual benchmark) can be executed in a given time. (The SPEC reference time happens to be one 24-hour day, with the execution times normalized with respect to SPEC reference machine). The SPEC rates therefore characterize the capacity of a system for compute-intensive jobs of similar characteristics. Similar to the speed metric, SPEC has defined averages for throughput metrics:

SPECint_rate_base95

SPECfp_rate_base95

SPECint_rate95

SPECfp_rate95

Note: Because of the different units, the values SPECint95/SPECfp95 and SPECrate_int95/SPECrate_fp95 cannot be compared directly.

The appropriate SPEC benchmark or metrics to use will depend on the customer’s performance requirements. For example, a single user running a compute-intensive integer program may only be interested in SPECint95 or SPECint_base95. On the other hand, a person who maintains a machine used by multiple scientists running floating-point simulations may be more concerned with SPECfp_rate95 or SPEC95fp_rate_base95.

DIGITAL Commentary: Because SPEC measures computational performance, DIGITAL runs these tests for Alpha-based models of the DIGITAL Server line. The superior performance results for NT environments (generally a factor of 2-3 times better than the results attainable using 32-bit hardware) are attributable to Alpha’s 64-bit hardware and floating point computational capabilities. DIGITAL believes that communities and markets benefiting from fast computational performance may wish to avail themselves of the benefits of the Windows NT operating environment.

SPECweb96 http://www.specbench.org/osg/web96/

A standardized benchmark for WWW servers, announced in July '96. It measures basic GET performance of static pages. The benchmark runs a HTTP engine on a number of driving "client" systems that will GET a variety of pages from the server that is being tested.

DIGITAL runs the SPECweb suite on the Alpha platform for the UNIX environment, with industry-leading results in each processor class (1P, 2P, 4P). We believe that as Windows NT becomes commonly-used within the scientific computing community over the next few years, that Alpha NT will continue to lead the pack. DIGITAL will consider an NT-specific Intranet or Internet test suite when a defacto standard emerges for NT.

Database Benchmarks

I. TPC-C http://www.tpc.org/

Vendor Summary: TPC-C is a de facto industry standard for On-Line Transaction Processing (OLTP). The test includes 5 different types of transactions:

New-order: enter a new order from a customer

Payment: update customer balance to reflect a payment

Delivery: deliver orders (done as a batch transaction)

Order-status: retrieve status of customer’s most recent order

Stock-level: monitor warehouse inventory

Metrics generated include tpm-C (transactions per minute) and $/tpm-C.

An extract from the Transaction Processing Council’s Web Site (FAQs)

Q: What do TPC throughput numbers mean?

A: Throughput, in TPC terms, is a measure of maximum sustained system performance. In TPC-C, throughput is defined as how many New-Order transactions per minute a system generates while the system is executing four other transactions types (Payment, Order-Status, Delivery, Stock-Level). All five TPC-C transactions have a certain user response time requirement, with the New-Order transaction response time set at 5 seconds. Therefore, for a 710 tpmC number, a system is generating 710 New-Order transactions per minute while fulfilling the rest of the TPC-C transaction mix workload.

Q: What do the TPC's price/performance numbers mean?

A: TPC's price/performance numbers (e.g., $550 per tpmC) include much more that just the initial cost of the computer or host machine. In general, TPC benchmarks are system-wide benchmarks, encompassing almost all cost dimensions of an entire system environment the user might purchase, including terminals, communications equipment, software (transaction monitors and database software), computer system or host, backup storage, and three years maintenance cost. Therefore, if the total system cost is $859,100 and the throughput is $1562 tpmC, the price/performance is derived by taking the price of the entire system ($859,100) divided by the performance (1562 tpmC), which equals $550 per tpmC.

Q: There are two ways to look at TPC results: performance and price/performance. Is one more important and how do I know which system has the best TPC result?

A: Either performance or price/performance may be more important, depending on your application. If your application environment demands very high, mission-critical performance, then obviously you may want to give more weight to the TPC's throughput metric. On the other hand, most users are shopping within a given price range and any throughput number must be balanced against the cost of the system. Generally, the best TPC results combine high throughput with low price/performance.

DIGITAL Commentary: DIGITAL runs TPC-C tests for most models of the DIGITAL Server line. In general, we find that the Intel-based models of the DIGITAL Server line are among the price/perfomance leaders, and the Alpha-based models provide additional scalability and faster compute performance (but not necessarily faster I/O performance) than the Intel models. Both of these factors should be considered specific to the customer’s application environment when making a choice among DIGITAL Server Models.

II. TPC-D http://www.tpc.org/

DIGITAL Commentary: TPC-D is a test suite used for database queries in a data warehousing environment. DIGITAL currently does not run these tests under Windows NT; however, results are posted for data warehousing tests on AlphaServer platform running DIGITAL UNIX. DIGITAL has run tests for a demo database performing sales & marketing queries on Oracle in an NT environment; these results compare Intel and Alpha platform performance and will be published at URL: http://www.digital.com/info/performance.dir.html

DIGITAL evaluated the performance of the DIGITAL Server family using industry-standard benchmarks. These benchmarks allow comparisons across vendors’ systems. Performance characterization is just one "data point" to be used in conjunction with other purchase criteria such as features, service, and price.

For more information on DIGITAL Servers for Windows NT, visit our web site at http://www.windows.digital.com/ or contact your local DIGITAL sales representative. Please send questions and comments about the information presented in this Performance Flash to Internet address: csgperf@zko.dec.com.

DIGITAL believes the information in this publication is accurate as of its publication date; such information is subject to change without notice. DIGITAL is not responsible for any inadvertent errors. DIGITAL conducts its business in a manner that conserves the environment and protects the safety and health of its employees, customers, and the community.

DIGITAL, the DIGITAL logo, the AlphaPowered logo, DIGITAL Servers and AlphaServer are trademarks of Digital Equipment Corporation.

The Intel Inside logo is a registered trademark of Intel Corporation.

AIM is a trademark of AIM Technology, Inc.

NetBench, ServerBench, and WebBench are trademarks of Ziff-Davis Inc.

SPEC, SPECint95, SPECfp95, SPECrate_int95, and SPECrate_fp95 are trademarks of the Standard Performance Evaluation Corporation.

TPC-C and TPC-D Benchmarks and tpm-C are trademarks of the Transaction Processing Performance Council.

Windows NT and Exchange are trademarks of Microsoft, Inc.

Lotus and Domino are trademarks of Lotus Development Corporation.

UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company Ltd.

SCO UnixWare is a trademark of the Santa Cruz Operation, Inc.



	Legal