Alpha: The History in Facts and Comments

AlphaPowered logotype Dig my grave both long and narrow
Make my coffin neat and strong 

(from an old American song)

Contents
 
 
Foreword
 
Part 1.  PDP and VAX
 
Part 2.  The PRISM Project
 
Part 3.  The Alpha Project
 
Part 4.  EV4, LCA4, EV45, LCA45
 
Part 5.  EV5, EV56, PCA56, PCA57
 
Part 6.  The Fall of DEC
 
Part 7.  EV6, EV67, EV68C, EV68A
 
Part 8.  The Epoch of Compaq
 
Part 9.  EV7, EV79, EV7z, EV8
 
Epilogue
 
 
Additional Information
 
Literature
Paul V. Bolotoff
 
Release date: 14th of April, 2005
Last modify date: 19th of March, 2006

 
 
in Russian

 
Foreword
 
This work starts a set of articles dedicated to Alpha processors and the architecture, as well as to other areas connected. The set, because presenting the whole material available in a single overview would be somewhat problematic and generally inadequate from the author's point of view. Besides, the theme about to be opened is vast very much, fundamental in many aspects, and there are no preceding papers comparable to the author's one in means of fullness and scalability, taking into account all the architecture not separate products distanced significantly in time. Maybe this article would be looking better if was written and published several years ago, when Alpha processors were real kings considering performance, and their future was expected to be bright very much. However, only nowadays it seems to be the right time to draw the final line, to explain what happened, and why one of the most interesting and promising computer architectures has been thrown into oblivion.
 
Generally, this paper is a historical overview with some elements of analysis, so it should be considered as such. It doesn't pretend to be universal though contains a real lot of reference information. On the other hand, it isn't a necrologue or a funeral prayer definitely...
 
back to the contents
 
 
Part 1. PDP and VAX
 
DEC's logotype
 
Digital Equipment Corporation (abbreviated to DEC) was founded in 1957 by two engineers, Kenneth Olsen and Harlan Anderson, graduates of Massachusetts Institute of Technology, and was one of the oldest and most known companies of the world computer industry.
 
Before founding, Olsen worked for Lincoln Laboratory at the institute mentioned above, which was supported by the Department of Defense [of the United States], and participated in development of one of the world's first transistor-based computers, TX-2. The company was producing and selling backplane modules for computers initially, but in 1960 it offered the first computer of its own, 18-bit PDP-1 (Programmable Data Processor - 1), capable of about 100 thousand operations per second. By the way, that machine was used to run the first computer game in known history, Spacewar of Steven Russell. 12-bit PDP-8, introduced in 1964, deserved to be called the first "minicomputer" (sized like a small wardrobe) manufactured in quantity. In addition, the price was attractive: about 18000 USD (1965) for the standardconfiguration. Because of an excellent price/performance ratio, PDP-8 was able to stand against those famous mainframe systems of IBM as a real competitor. There were about 1450 machines produced until 1968 (not counting numerous modifications following). 36-bit PDP-10 was ready in the same 1968, based upon the design of experimental PDP-6, and targeted for data processing centres, research laboratories, and military needs. Different versions of PDP-10 were manufactured until 1983. There were attempts taken towards improvements of that 36-bit architecture, organised within the Unicorn project under supervision of Leonard Hughes and David Rogers, but the project was closed in June of 1975, and all its resources were transferred to support another, 32-bit, architecture.
 
16-bit PDP-11 was launched in production in the beginning of 1970's. It was the first computer of DEC to feature use of 8-bit bytes, and a direct successor of the PDP-8 model line. Due to a simple and fortunate Unibus-based architecture (or a modified one, based upon Q-bus), a considerably effective instruction set, and low production costs, the model line of PDP-11 had faced a success. Quite obviously, PDP-11 became a subject of cloning actions all over the world, including even those "countries of people's democracy": CM-4 (the USSR, Bulgaria, Hungary), CM-1420 (the USSR, Bulgaria, the German Democratic Republic), CM-1600 (the USSR), IZOT-1016 (Bulgaria), DVK (the USSR). There were many operating systems developed for PDP-11: DEC offered P/OS, RSX-11, RT-11, RSTS/E, also several derivatives of DOS, and finally, the first release of UNIX OS was completed at Bell Laboratories on PDP-7 and PDP-11 machines in 1971 using their assembly languages. PDP-11 left the market during 1980's because of one, but inevitable reason: lack of address space. A new, 32-bit though still CISC, architecture was promoted to the market.
 
So, that architecture was VAX (Virtual Address eXtension), approved officially during a VAX Architecture Committee session in April of 1975. The architecture was developed in several months while the Star project was operational and supervised by Gordon Bell, in parallel with the Unicorn project mentioned above. Upon completion of both the projects it was decided to cancel any further development of 36-bit systems, and to concentrate resources available to support 32-bit VAXen (the plural of VAX). In fact, the Star project was to prove the necessity of increasing general registers' width of PDP-11 to 32 bits, their number from 8 to 16, and a significant redesign of the instruction set. The first VAX machine was announced in October of 1977, model 11/780. A few months later, in February of 1978, was released a new operating system for VAXen, VMS (Virtual Memory System) v1.0. It was a multi-user and multi-tasking OS supporting up to 64Mb of main memory, networking functions (DECnet), an adaptive task scheduler, an extended process management, and many more innovations hard to be seen before. Renamed to VAX/VMS, v2.0 was presented in April of 1980, and carried numerous improvements over v1.0. In addition, the classical OS UNIX was ported to VAX soon. VAXen were manufactured and sold with a real success during 1980's, and were shipped in limited quantities under special contracts even when close to the end of the century. The whole model line included several dozens of kinds ranging from compact workstations to 6-processor mainframe-class servers. Even nowadays, thousands of VAXen keep working at subdivisions of the Department of Defense and the NSA (National Security Agency), also at numerous commercial organisations. Nevertheless, the epoch of VAXen was 1980's, at least because DEC made a bet on a new architecture in 1990's.

 
VAX 11/780 brochure cover VMS sales update cover

back to the contents
 
 
Part 2. The PRISM Project
 
In the beginning of 1980's, DEC was on the paramount of its financial wealth, mostly because of high revenues related to growing constantly sales of VAX machines. However, nothing lasts forever, and it was obvious that some day VAX would have to leave the market in favour of a new architecture as it was happening with PDP-11. Those days many companies started to pay more and more attention to RISC-based concepts and implementations, and DEC had no intention to ignore that trend. There were several subdivisions inside of DEC between 1982 and 1985, which researched actively over the RISC area:
  • Titan, a high-speed design by Western Research Laboratory (DECwest) in Palo Alto (California), supervised by Forest Baskett, since 1982;
     
  • SAFE (Streamline Architecture for Fast Execution), supervised by Alan Kotok and David Orbits, since 1983;
     
  • HR-32 (Hudson RISC 32-bit), located at DEC's factory in Hudson (Massachusetts), supervised by Richard Witek and Daniel Dobberpuhl, since 1984;
     
  • CASCADE by David Cutler in Seattle (Washington), since 1984.

In 1985, after Cutler's initiative on creating a so-called corporate RISC plan, all 4 projects were merged into one, PRISM (PaRallel Instruction Set Machine), and the first draft for a new RISC processor was released in August of 1985. To mention, DEC had participated in the development of MIPS R3000 processor those days and even initiated the creation of Advanced Computing Environment consortium to promote that architecture.
 
No wonder that the processor inherited many features of the MIPS architecture during development, but at the same time the differences were obvious. All instructions were of fixed-length of 32 bits with the upper 6 and the lower 5 ones presenting an instruction code actually, and the remaining 21 were reserved for immediate data or addressing needs. There were 64 primary 32-bit general-purpose registers defined (MIPS supposed 32), also 16 additional 64-bit vector registers, and 3 control registers for vector operations: two 7-bit (vector length and vector count), and one 64-bit (vector mask). There was no processor state register, thus a result of two scalar operands compared was placed into a general-purpose register, but a result of two vector operands compared — into the vector mask. There was no built-in floating-point unit. A set of special instructions (Epicode, or Extended processor instruction code) was created in software [through loadable microcode] to facilitate handling of special tasks required for a particular environment or operating system given and not supported by the standard instruction set otherwise. Later, this function was implemented in the Alpha architecture under the name of PALcode (Privileged Architecture Library code).
 
In 1988, when the project was still in progress, the high management of DEC decided to close it considering any further support as a waste of resources. Protesting against that decision, Cutler resigned and went to Microsoft to supervise a department developing Windows NT (called OS/2 3.0 those days).
 
In the beginning of 1989, DEC presented first RISC-powered workstations of its own, DECstation 3100 with 32-bit MIPS R2000 inside clocked at 16MHz, and DECstation 2100 using the same processor type but clocked at 12MHz. Both the machines were running Ultrix OS and were priced rather inexpensively (about 8 ths. USD (1990) for DECstation 2100).
 
back to the contents
 
 
Part 3. The Alpha Project
 
In 1989, the aging VAX architecture was hardly able to compete with RISC architectures of the 2nd generation such as MIPS and SPARC, and it was obvious that the next generation of RISC hardware would leave not so many chances to survive for VAX. In the middle of 1989, DEC's engineers had received a task to create a competitive RISC architecture with a long-term potential, but at the same time carrying a minimal set of incompatibilities with VAX. That was because VAX/VMS and all accompanying applications had to be ported to the new architecture, which was also defined to be 64-bit right from the start since competitors were about to release their 64-bit solutions. A development group was created with Richard Witek and Richard Sites involved as the chief architects.
 
The Alpha architecture was mentioned officially for the first time on the 25th of February 1992 during a conference in Tokyo. In addition, most key features of the new architecture were listed within a concise overview (for comp.arch, a USENET conference). It was also mentioned that "Alpha" was an internal code-name and an official name would be provided later. The new processor was of a "clean" 64-bit RISC design to execute fixed-length instructions (32 bits every), with 32 integer 64-bit registers, operated with 43-bit virtual addresses (with a possibility to expand up to 64 bits in future implementations of the architecture). Like VAX, it used little-endian byte order (i.e. when a low byte of a register occupies a low memory address when stored, in contrary to big-endian byte order, introduced by Motorola and used in most processor architectures, when a low byte of a register occupies a high memory address when stored). A mathematical co-processor was built into the core together with 32 floating-point 64-bit registers which utilised random access order unlike primitive stack access order implemented in Intel x87 co-processors. The total lifetime of the new architecture was estimated in no less than 25 years.
 
The instruction set was simplified to facilitate pipelining actions as much as possible and consisted of 5 groups:
  • integer instructions;
  • floating-point instructions;
  • branch and compare instructions;
  • load and store instructions;
  • PALcode instructions.

To mention, there was no hardware support for integer divide instructions, because they would be the most computationally-expensive integer ones and thus badly pipelineable, so they were just emulated. It was acceptable, because this kind of instructions was needed relatively not so often in real life.
 
Alpha architecture was a "real" RISC in contrary to modern processors of the x86 architecture which are RISC inside only. The conceptual difference between RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing) was (and still is) within a few moments:

 
Feature CISC RISC
Instruction
length
Variable,
depends upon
an instruction type
Fixed,
doesn't depend upon
an instruction type
Instruction
set
Wide,
adapted for
programmer's needs
Balanced,
adapted for
processor's execution
convenience
Memory
access
Allowed for different
kinds of instructions
Allowed for load/store
instructions only

The processor was supposed to be launched in production at a very high core frequency — 150MHz which should be increased for up to 200MHz while utilising the same engineering limits. That appeared to be possible because of a successful architecture as well as because of the engineers' rejection to involve automatic design systems and doing all the work just by hands.
 
The project entered manufacturing stage and was reorganised into a regular division of DEC soon.
 
Because of DEC marketing department's efforts the new architecture was called AXP (or Alpha AXP), though still not known for sure what exactly this abbreviation meant. Quite possible that nothing at all: in the past, DEC had legal problems with its VAX brand because there was another pretending company, a manufacturer of vacuum cleaners, and that time the conflict was taken to court. By the way, it was also motivated that DEC's equipment sales suffered because of the other company's slogan, "Nothing sucks like a Vax!" After all, a joke had shown up saying that AXP meant "Almost Exactly PRISM"
 
back to the contents
 
 
Part 4. EV4, LCA4, EV45, LCA45
 
The first processor of the Alpha family was called 21064 ("21" implied that Alpha was an architecture of the 21st century, "0" — a processor's generation, "64" — a computational capability in bits), also code-named as EV4 ("EV" was [supposedly] the abbreviation of "Extended VAX" and "4" — a technical process' generation, CMOS4 — which, in turn, stood for Complementary Metal Oxide Semiconductor). To mention, a prototype of EV4 was ready in 1991 by using a less detailed CMOS3 process, therefore with the cache sizes reduced and with no floating-point unit. Nevertheless, it was an important threshold for tuning and polishing off the architecture and software environment. EV4 was introduced in November of 1992 and was manufactured using an advanced for those days 3-layer 0.75µ technological process (in the future, it was modified towards 0.675µ CMOS4S, the optical modification of CMOS4). Was designed for 3.3V supply and ith core frequencies ranging from 150MHz to 200MHz (TDP from 21W to 27W). Consisted of 1.68 mln. transistors and utilised a die size of 233mm². Supported multi-processing as one of the architecture's key features. Form-factor: PGA-431 (Pin Grid Array).
 
The L1 cache was integrated: 8Kb for instructions (I-cache, instruction cache), direct-mapped, also 8Kb for data (D-cache, data cache), direct-mapped and write-through. Read latency of D-cache was 3 ticks. Every line of I-cache consisted of 32 bytes of instructions, a 21-bit tag record, an 8-bit branch history field, and of several auxiliary fields. Every line of D-cache consisted of 32 bytes of data and a 21-bit tag record. The L2 cache (B-cache, back-up cache) was a recommended option, using external synchronous or asynchronous SRAM chips, direct-mapped, write-back, write-ahead and sized up to 16Mb (from 512Kb to 2Mb usually). Every line consisted of 32 bytes of data or instructions with a 1-bit long-word parity or 7-bit long-word ECC field, a 17-bit maximum tag record with an additional 1-bit long-word parity protection, and a 3-bit condition flag with an additional parity bit. Read and write speeds of B-cache were programmable in the processor's ticks. The system data bus was either 64-bit or 128-bit wide (programmable, with a 1-bit long-word parity or 7-bit long-word ECC field) and was multiplexed with B-cache data bus, switched between if necessary. The system address bus was 34-bit wide. B-cache was organised to be inclusive to D-cache, i.e. contained a full copy of the latter. A mechanism called victim write was used to store data from B-cache to memory. The processor and no one else was able to perform read/write operations with B-cache, though the system logic was granted a permission to read B-tag data since it was of the top importance for multi-processor systems especially, to maintain cache coherence of all processors available within a machine.
 
The processor was powered with one integer pipeline (E-box, 7 stages) and one floating-point pipeline (F-box, 10 stages). The instruction decoder and scheduler (I-box) was able to supply up to 2 commands per tick to the functional units, namely E-box, F-box, and load/store unit (A-box), in-order. The cache memory and system bus controller (C-box) worked in cooperation with A-box and supervised integrated I-cache and D-cache as well as external B-cache. Calculations of virtual addresses were handled by E-box. The branch prediction unit maintained a 4096-entry branch prediction table with 2 bits per entry. There was I-TLB (Instruction TLB) of 8 entries for 8Kb pages and 4 entries for 4Mb pages, also D-TLB (Data TLB) of 32 entries. Both of them were fully associative.

 
Micrograph of EV4 Floor-plan of EV4
EV4 (front) EV4 (back)

With a respect to its excellent performance, EV4 was expensive considerably for most potential customers, thus a low-priced brother was released in September of 1993, 21066 (LCA4 or LCA4S). It was based upon the core of EV4, but with memory and PCI controllers integrated additionally, also several secondary functional units. On the other hand, the system data bus width was reduced to 64 bits causing a negative impact on performance. LCA4 was manufactured using a 0.675µ CMOS4S process resulting in a die size even smaller than of original EV4 (209mm² compared to 233mm²). Additionally, its clock frequencies were lowered to range from 100MHz to 166MHz, presumably to avoid potential overheating issues common for ventilated badly desktop cases of those days, also to avoid creation of an additional competitor to EV4. Contained 1.75 mln. transistors and required 3.3V supply. The design of this processor was licenced to Mitsubishi, so it manufactured LCA4 as well (including a 200MHz version).
 
21064A (EV45) was announced at Microprocessor Forum in October of 1993. It was a modified EV4, produced using a 4-layer 0.5µ CMOS5 process. 21066A (LCA45) was presented at COMDEX in November of 1994. It was modified almost the same way as EV4 was towards EV45 but against LCA4. To mention, DEC's marketing people developed a habit to add a letter to a processor's model name after a redesign towards a more advanced technological process. Both the cores of EV45 and LCA45 were changed not so much: I-cache and D-cache of EV45 were doubled in size (16Kb I-cache + 16Kb D-cache) and their data and tag fields gained a parity bit each, branch history fields of I-cache were expanded to 16 bits, D-cache had become 2-way set associative, and 1-bit byte parity mode was added to those existing integrity modes of the system data bus. In addition, both EV45 and LCA45 were awarded with a modified F-box (division optimisation: EV4 could execute a floating-point division instruction in 34 ticks for single-precision operands and in 63 ticks for double-precision operands with no dependence upon operands' values; EV45 could do the same thing in 19 to 34 ticks for single-precision operands and in 29 to 63 ticks for double-precision operands, dependable upon operands' values). LCA45 was also manufactured by Mitsubishi. Both the dies were decreased in size: to 164mm² for EV45 and 161mm² for LCA45. The transistors' count increased to 2.85 mln. for EV45 and remained the same for LCA45 — 1.75 mln. Finally, power consumption per tick decreased for both the processors, though voltage didn't change from 3.3V. Core frequencies of EV45 ranged from 200MHz to 300MHz (TDP from 24W to 36W), of LCA45 — from 166MHz to 233MHz.
 
DEC developed equipment for the Department of Defense (of the USA), so 21068 66MHz and 21068A 100MHz were introduced in 1994. They derived from LCA4 and LCA45 respectively, advanced for military needs (passive cooling, extreme temperature conditions asf.).
 
First chipsets for EV4 featured support for TURBOchannel, FutureBus+ and XMI peripheral buses. Alhough all of them were high-speed designs for those days (about 100Mb/s per bus), they didn't obtain any significant support, thus a very limited set of peripherals was available for them. So, DEC paid certain attention to industry-standard bus architectures, such as PCI and ISA (EISA). A new chipset was introduced in 1994, DEC Apecs, in two editions: for 64-bit system data bus (21071) and for 128-bit one (21072). The difference was that 21071 consisted of 4 chips (1 universal controller — COMANCHE, 2 data slices — DECADE, 1 PCI bus controller — EPIC) but 21072 — of 6 (2 additional data slices). Supported 33MHz system bus frequency, up to 16Mb of B-cache, up to 4Gb of FPM parity memory with access time from 100 to 50ns (8 banks), and up to 16Mb of dual-ported VRAM for an optional video frame-buffer (1 bank). Support for the ISA or EISA buses could be implemented through use of standard bridges, such as i82378IB (ISA) or i82378EB (EISA). Had been used with EB64+ and AlphaPC 64 (code-named as Cabriolet) mainboard designs.
 
The first workstation of Alpha architecture was available in November of 1992, DEC 3000 Model 500 AXP (code-named as Flamingo), with a 150MHz EV4, 512Kb of B-cache, 32Mb of main memory, integrated 8-bit video controller with 2Mb of VRAM, 1Gb SCSI HDD, SCSI CD-ROM, built-in 10Mbit Ethernet controller (thick coaxial and twisted pair), built-in sound and ISDN controllers, also a 19" monitor (1280x1024x72Hz). The price was impressive very much: 39 ths. USD.
 
In July of 1994, two EV45-based workstations had been announced: DEC 3000 Model 900 AXP and Model 700 AXP (code-named as Flamingo45 and Sandpiper45 respectively). The first one was powered with a 275MHz processor, but the second one — with a 225MHz. Both of them were accommodated with 2Mb of B-cache, 128Mb of main memory, a ZLX family 24-bit video card, FastSCSI peripherals, and the same networking, sound, and ISDN hardware to of Model 500 AXP. The first workstation was offered for 43,4 ths. USD, the second — for 27,7 ths. USD.

Drawing of DEC Apecs

back to the contents
 
 
Part 5. EV5, EV56, PCA56, PCA57
 
DEC had unveiled the very first information about the 2nd generation Alpha processor at Hot Chips conference located in Palo Alto (California), which started on the 14th of August 1994. Although the official release of 21164 (EV5) was dated by the 7th of September 1994, after a respective press release by DEC. The processor was based upon the core of EV45 and was rather an evolution of the latter than a revolutionary new design. The number of pipelines was doubled, both integer and floating-point, when compared to EV4 or EV45. In addition, the floating-point pipelines were transformed to run through 9 stages rather than 10. Additionally, the integer pipelines weren't all the same if compared to each other: while both were capable of elementary arithmetical and logical operations, the 1st only could multiply and shift, and the 2nd only was able to process conditional/unconditional branches. Both the pipelines could calculate virtual addresses for load instructions, but the 1st one only — for store. The floating-point pipelines were different as well: the 1st could execute any floating-point code except of multiply instructions, which were the only code the 2nd pipeline could process. I-box was able to fetch and decode up to 4 instructions per tick to provide the execution units with a proper load. Was manufactured using the same 4-layer 0.5µ CMOS5 process as EV45, required 3.3V supply, contained 9.3 mln. transistors (including 7.8 mln. for integrated cache areas), utilised a die size of 299mm² — very close to theoretical limits of the technical process involved. Core frequencies ranged from 266MHz to 333MHz (TDP from 46W to 56W). Form-factor: IPGA-499 (Interstitial Pin Grid Array).
 
I-cache and D-cache were sized and organised just like in EV4, i.e. 8Kb each. D-cache remained write-through but was made dual-ported, i.e. was able to deliver data for 2 load instructions per tick. Sacrificing transistors for the sake of performance, D-cache was composed physically of 2 identical absolutely parts of 8Kb each, so data could be read from either one, but had to be written to the both. The processor was accommodated with 96Kb of the integrated L2 cache (S-cache, secondary cache), write-back, 3-way set associative, and C-box was made able of utilising it through a dedicated 128-bit data bus. At the same time, B-cache was also functional though remained optional, consisted of external cache SRAMs and could be as large as 64Mb, though usually from 1Mb to 4Mb — in other words, EV5 supported 3 cache levels. S-cache was accessed through a 4-stage pipeline: two ticks for tag search and modification plus two ticks for data access and delivery. Every S-cache line was 64 bytes wide with one tag per line, though it was possible to address every line as of two sublines 32 bytes wide each. Read latency of D-cache was reduced from 3 to 2 ticks, and S-cache could deliver data in 7 ticks (like mentioned above, 4 ticks for the first set of 16 bytes and 1 tick for every next set of 16 bytes to fill a whole line). Like in EV4, contents of D-cache were doubled, but in S-cache this time. In turn, B-cache was inclusive to S-cache regardless of the difference in associativities. I-TLB held 48 entries (for pages sized from 8Kb to 4Mb), D-TLB — 64 entries, and it had become dual-ported for reading like D-cache. The system data bus was fixed-length at 128 bits with additional 16 bits for ECC protection, still multiplexed with the data path to B-cache. The system address bus was 40-bit, the control — 10-bit.

 
Micrograph of EV5 Floor-plan of EV5

21164A (EV56) was introduced at Microprocessor Forum in October of 1995. It was a modified release of EV5, after a technology shrink to a 4-layer 0.35µ CMOS6, manufactured at the same factory in Hudson (DEC had invested about 450 mln. USD prior to in modernisation). The most important architectural difference was BWX (Byte-Word Extension) — a set of 6 additional commands to load/store data in 8- or 16-bit quanta. Right from the start, the Alpha architecture was forced to load/store data in 32- or 64-bit quanta, what caused certain difficulties while porting or emulating code belonging to other processor architectures, such as i386 or MIPS. A request to implement BWX in hardware was submitted in June of 1994 by Richard Sites and was approved in June of 1995. Although to utilise BWX a chipset should be aware of it as well. EV56 was manufactured with core frequencies ranged from 366MHz to 666MHz (TDP from 31W to 55W), starting from the summer of 1996. Also was produced by Samsung under a licence agreement signed in June of 1996 (a 666MHz version was shipped from Samsung only). Contained 9.66 mln. transistors, utilised a die size of 209mm² and required dual voltage (2.5V for primary and 3.3V for input-output circuits).

 
Micrograph of EV56
EV56 (front) EV56 (back)

21164PC (PCA56) was introduced on the 17th of March 1997. It was a low-cost version of EV56 designed by DEC and Mitsubishi cooperatively. S-cache was absent as well as accompanying logic, but I-cache size was increased by factor of two (to 16Kb). Contained 3.5 mln. transistors, utilised a die size of 141mm², also the same technical process and voltage as EV56, but the form-factor did change: IPGA-413 instead of IPGA-499. Core frequencies ranged from 400MHz to 533MHz (TDP from 26W to 35W). In the future, 0.28µ 21164PC (PCA57) was manufactured by Samsung, with I-cache and D-cache doubled in size, also with 2-way set associativity of D-cache. The transistors' count increased to 5.7 mln. but the die size decreased to 101mm² at the same time. Required lower voltages: 2.0V for primary and 2.5V for input-output logic. Core frequencies ranged from 533 to 666MHz (TDP from 18W to 23W).
 
In addition to BWX (inherited from EV56), PCA56 and PCA57 supported a new instruction set, MVI (Motion Video Instructions), targeted to accelerate video and audio calculations using SIMD (Single Instruction — Multiple Data) approach, somewhat comparable to the MMX instruction set for i386 processors.
 
The first standard chipset developed for EV5 was DEC Alcor (21171). It supported a 33MHz system bus, up to 64Mb of B-cache, up to 8Gb of main memory (FPM ECC, using a 256-bit wide memory data path), also a 64-bit PCI bus at 33MHz. Support for either the ISA or EISA bus could be added through use of a standard bridge like before. There was no built-in IDE controller, which could be installed separately using a third-party hardware. The chipset consisted physically of 5 chips: 1 universal controller with the PCI bus support (Control, I/O and Address — CIA) and 4 data switches (DSW). A new release of Alcor was completed after launching EV56 in production — Alcor 2 (21172), which featured the BWX support. It was followed soon by Pyxis (21174), a single-chip solution supporting 66MHz system bus and 66MHz SDRAM ECC memory accessed through a 128-bit wide memory path. There was also VLSI Polaris developed for PCA57-based systems.
 
back to the contents
 
 
Part 6. The Fall of DEC
 
On the 26th of January 1998, a news flashed all over the computer world that struggling financially DEC was purchased by Compaq Computer Corporation, and the deal was about to be approved by the upcoming shareholders' meetings of both the companies. DEC's shareholders ratified the agreement on the 2nd of February 1998. The amount of sale was 9.6 mlrd. USD, compared to DEC's estimated market capitalisation of about 7 mlrd. USD. The process of integrating DEC's functional units into Compaq's business structure was finished about half a year later with the legal end of DEC, when its shares were taken off the New York Stock Exchange on the 11th of June 1998. To mention, negotiations between DEC and Compaq started in 1995, but finished unsuccessfully in 1996 because DEC's high management held a position insisting on a merger, not on an acquisition. Nevertheless, here comes a question: how could it happen that a huge company (in figures of 1989: almost 130 thousand of personnel, gross revenue of about 14 mlrd. USD per year, i.e. the second company in the industry after IBM), which held a very high R&D potential and significant manufacturing facilities, was forced to sell itself to a large computer-building company from Texas? There was no definite answer to this question, though reasons mentioned were various. About them in detail.
 
A long time ago, Kenneth Olsen, a founder, president and CEO of DEC until almost the end, said that well-engineered products would sell themselves. Thus, have no need in any advertising campaigns or other instruments of market promotion. He also mentioned that there is no reason anyone would want a computer at home. Perhaps, these thoughts were correct in those "old good times", when computer equipment was manufactured in limited quantities by professionals and for professionals, thus cost a hefty amount per unit. However, they weren't appropriate somewhere close to the end of the 20th century, when computer equipment was sold in million units per year, and a very regular computer could be taken together using a screwdriver and parts from the nearest computer shop for an hour maximum, besides it would cost over 10 times less than a big one mentioned previously. Finally, nothing should prevent from purchasing a whole working box right from that shop with a free delivery. Considering that such a regular machine would be purchased most likely not by a professional manager, realising clearly what TCO (Total Cost of Ownership) means, but by an aunt Marge or a young prankster Johnny, making no difference between a transistor and a resistor, so such customers should be motivated definitely not by engineering advantages of a potential purchase. Mistake #1.
 
When at the very beginning of the Alpha architecture's way, DEC's high management made a great strategic mistake. It was a known fact that first prototypes of EV4 were presented on a computer conference in February of 1991. Among others, there were engineers of Apple Computer admitted, looking for a new processor architecture to power company's future computers, and they were impressed by advantages of EV4. John Sculley, Apple's CEO of those days, met with Kenneth Olsen in June of the same year and offered him to use the new processor of DEC in future Macs. Olsen refused the offer motivating that the processor was not ready for the market, besides the VAX architecture hadn't reached its end-of-life yet. Several months later, rumours said that new Macs would be powered by PowerPC processors from the alliance of Apple, IBM and Motorola. William Demmer, a former vice-president of VAX and Alpha divisions who resigned in 1995, said later in his interview to the Business Week (the 28th of April 1997): "Ken did not want the company's future to run on Alpha." Mistake #2.
 
DEC manufactured Alpha processors as well as accompanying chipsets and numerous peripherals at its own factory in Hudson (Massachusetts). It designed and produced OEM- and retail-available mainboards for desktops and workstations only (they were even called so, Evaluation Board or AlphaPC), in a limited assortment though. Neither of them supported SMP, though almost all Alpha servers by DEC were multi-processors. Nevertheless, all mainboards were very well-engineered, though expensive like Alpha processors. Their layout schemes were available for public access, so several companies (Aspen, Polywell, Enorex asf.) manufactured fully qualified clones. The only company to develop and produce stand-alone designs was DeskStation. In general, it could be stated for sure that DEC considered a priority to produce workstations and servers of its own, but not to fill the market of computer components for those workstations and servers mentioned. It's possible to survive by following such an approach, but not possible to conquer the market and to promote the architecture to masses. Mistake #3.
 
Despite all attempts taken, DEC didn't manage to make pricing of its products (considering processors, chipsets and mainboards first) affordable to most potential customers. For example, 266MHz and 300MHz EV5 were offered in the beginning of 1995 for 2052 and 2937 USD respectively in lots of 1000 units — both the enormous prices even taking into account average manufacturing costs (estimated) of 430 USD per unit. Considering price per one "parrot" of SPECint92, EV5 cost about 2 times higher than competitive RISC designs! At the same time, a standard chipset for EV5 (Alcor) was offered much cheaper — 295 USD in lots of 5000 units, though the only Alcor-based mainboard from DEC (EB164 with 1Mb of B-cache), bundled with a processor and 16Mb of main memory (by the way, that was not enough to run most applications even of those days), carried a list price of about 7500 USD. Mistake #4.
 
Although Alpha was declared an open architecture right from the start, there was no consortium to develop it. All R&D actions were handled by DEC itself, in cooperation with Mitsubishi sometimes. In fact, though the architecture was free de jure, most important hardware designs of it were pretty much closed de facto, and had to be paid-licenced (if could be at all). So, it wasn't that thing helping to promote the architecture. To mention, soon after introduction of EV4, DEC's high management offered to licence manufacturing rights to Intel, Motorola, NEC and Texas Instruments. But all these companies were involved in different projects and were of a very little to no interest in EV4, so they refused. Perhaps, the conditions could be also unacceptable or something else. Mistake #5.
 
After all, even the fastest computer without an operating system and accompanying software is just an expensive source of noise and an environmental heater. DEC targeted its Alpha hardware for Windows NT, Digital UNIX and OpenVMS, following this priority order exactly. Could be not bad, but...
 
Windows NT was an operating system designed for users when right out-of-the-box, not for programmers (no software development tools supplied), hence dependent heavily upon precompiled applications, commercial notably. In fact, numbers of Alpha-ready and i386-ready software titles were different by a few times. Although there was FX!32, an excellent emulator and translator of x86 code to Alpha, completed by Anton Chernoff's team in 1996. While being a useful solution itself, couldn't help with performance decrease of 40% at least, when compared to the same source code compiled natively. Next, there were drivers, and FX!32 was absolutely of no help there. Considering a fact that very few hardware manufacturers honoured the Alpha architecture enough to release any of them, users had to rely mostly upon Microsoft and DEC. Finally, Windows NT (3.51 as well as 4.0) was a 32-bit OS regardless of running on the 64-bit Alpha hardware, thus was unable of utilising it to the full extent. However, all these issues didn't prevent DEC to promote its Alpha systems with a slogan "Born to run Windows NT". In brief, such an OS shouldn't be positioned as the primary for the Alpha architecture, though having it available as an option was a big plus for the architecture, especially on the workstation market. Mistake #6.

 
OpenVMS OpenVMS and Digital UNIX (also known as OSF/1, and later as Tru64 UNIX), two reliable and scalable commercial operating systems by DEC, they didn't obtain any vast popularity because of high prices (for example, over 1000 USD for one copy of Digital UNIX in 1997), and as a result, of closed source code. Although there were other drawbacks available, such as even more limited hardware base supported when compared to Windows NT, if either of these OS was given freedom together with DEC's excellent development tools, it could increase the Alpha architecture's market share strongly. Mistake #7. Digital UNIX

 
NetBSD DEC didn't support free open-source operating systems, though the very first of them, NetBSD, was ported to Alpha in 1995, followed by Linux, OpenBSD and FreeBSD. It was strange at least, because these OS were (and still are) very popular in the Alpha environment, also their market value was obvious to estimate even for those days, and was increasing constantly. Besides, these OS featured no worse performance than commercial Digital UNIX or OpenVMS and hardware support comparable to Windows NT (much better nowadays) as well as many other benefits you may expect from open-source software. Mistake #8. OpenBSD
Linux FreeBSD

 
The list of DEC's strategical mistakes could be continued, including a complete disregard paid to the revolution of mass and cheap personal computers, an over-diversified business model, and others less important and unrelated directly to the Alpha architecture. Therefore, the final conclusion could be derived from the author's point of view: DEC had done a real number of efforts to make as much money as possible with the Alpha architecture, but had done almost no efforts to help the architecture itself.
 
The board of directors, motivated by numerous company's failures during the late 1980's and early 1990's, suspended Olsen from managing the corporation in June of 1992 and appointed Robert Palmer instead. He did a hard try to reorganise the company's managing model in 1994 turning the existing "matrix" model (when departments different functionally cooperated to make a decision) into a traditional "vertical" (with authorities and responsibilities defined clearly from the very top to the very bottom of a company). From 1991 to 1994, DEC's net losses figured into over 4 mlrd. USD including 2 mlrd. just from July of 1993 to June of 1994 (in turn, including 1.2 mlrd. spent for restructurisation). The number of personnel was reduced to 85 thousand. Accordingly to Palmer's programme, the company should get rid of many divisions considered non-priority, so the global sale began. In July of 1994, the Storage Business Unit (manufacturing disk and tape drives) was sold to Quantum for 400 mln. USD, soon after a fiasco of the first models of thin-film hard drives (RA90 and RA92), which entered the market too late because of design flaws and didn't survive in competition. In August of 1994, the Database Software Unit was sold to Oracle for 100 mln. USD, also a 7.8% share in Italian Olivetti was redeemed for 140 mln. USD. In November of 1997, a deal was arranged to transfer the Network Product Business Unit to Cabletron for 430 mln. USD.
 
The fall of DEC was loud enough. It sued Intel in May of 1997 accusing in infringements upon 10 patents issued for the Alpha architecture while designing Pentium, Pentium Pro and Pentium II processors. Intel started a lawsuit against DEC in September of 1997 claiming its 14 patents to be dishonoured while designing Alpha processors. The peace was reached finally on the 27th of October 1997: both the companies took their complaints back, DEC licenced to Intel manufacturing rights for all its hardware available (except of the Alpha segment), also agreed to support the future IA-64 architecture, and Intel purchased from DEC the factory in Hudson accompanied with designing centres in Jerusalem (Israel) and Austin (Texas) for 625 mln. USD, also agreed to manufacture DEC's Alpha processors in the future. Additionally, an agreement was signed to cross-licence their patents for 10 years. The deal was finished on the 18th of May 1998; by that time, Compaq had adopted DEC's primary divisions employing 38 thousand of personnel compared to 32 thousand of Compaq before the acquisition, though many of them were laid off in the very near future.
 
Well to mention, not so long before the end of DEC and soon after that, many leading engineers who created DEC's realm in fact, left for other employers: Derrick Meyer quit to AMD to design K7; also to AMD, but as an architect of K8, went James Keller; Daniel Leibholz was hired by Sun to create UltraSPARC V; Richard Sites, one of primary Alpha architects during all the previous years, also abandoned the ship. Intel was lucky much less: the StrongARM architecture (inherited from DEC) seemed to be at a dead end because no one of those chief architects who designed StrongARM-110 previously, such as Daniel Dobberpuhl, Richard Witek, Gregory Hoeppner and Liam Madden, decided to join the new owner. More about that: Witek's team, which worked in Austin towards the 2nd generation of the StrongARM core, resigned completely, so Intel had to design the core from the scratch literally, involving engineers of its own who worked on i960 before.
 
back to the contents
 
 
Part 7. EV6, EV67, EV68C, EV68A
 
Although 21264 (EV6) processor was developed by DEC and was mentioned first at Microprocessor Forum in October of 1996, the final silicon implementation was done by February of 1998, when DEC was in process of liquidation. The processor itself was a significant step forward when compared to EV5, revolutional in many aspects. One of the most important innovations was out-of-order execution, which implied a fundamental core redesign and lowered functional units' dependence upon cache and main memory's bandwidth. EV6 could reorder up to 80 instructions on the fly, and that was much more than other competitive products could (say, Intel's P6 architecture utilised out-of-order execution for up to 40 [micro-commands], HP PA-8x00 — up to 56, MIPS R12000 — up to 48, IBM POWER3 — up to 32, but PowerPC G4 — up to 5; Sun UltraSPARC II didn't support instruction reordering at all). Out-order-execution was supported by register renaming technique, so there were 48 integer and 40 floating-point additional physical registers implemented (the number of logical registers, also referred as programmable, remained unchanged — 32 integer and 32 floating-point).
 
The number of integer pipelines was increased to 4 (organised in 2 clusters), but they were somewhat different functionally: the 2nd pipeline was capable of multiplying (7 ticks per instruction) and shifting (1 tick), the 4th — of executing MVI code (3 ticks) and shifting. Besides, all 4 pipelines supported elementary arithmetical and logical operations (1 tick). Every cluster featured an integer register file of its own (80 entries, like mentioned above), but they were identical (synchronised). The 1st and the 3rd pipelines also handled some tasks of A-box by calculating virtual addresses for load/store instructions. A-box itself worked with I-TLB and D-TLB (128 entries each), load and store queues (32 commands each), also 8 64-byte buffers (miss address file) for transactions with B-cache and main memory. Floating-point pipelines were different functionally as well: the 1st supported adding (4 ticks), dividing (12 ticks for single-precision and 15 ticks for double-precision), square root calculating (15 and 30 ticks), but the 2nd was only capable of multiplying (4 ticks). By the way, square root calculating unit and instructions related were new to the Alpha architecture. Like before in EV5, decoder was able to process up to 4 instructions per tick, and scheduler separated them for 2 queues: to integer pipelines (I-queue, 20 commands) and floating-point pipelines (F-queue, 15 commands). Behind of square root calculations, prefetch instructions were implemented as well as commands to transfer data between integer and floating-point registers.
 
C-box was redesigned significantly and was capable of supporting only 2 cache levels. The integrated L1 consisted of 64Kb I-cache and 64Kb D-cache, both 2-way set associative and with 64-byte lines. D-cache was write-back, though still was doubled in B-cache. Because of a large size and more complicated associativity policy, D-cache read/write latencies were increased to 3 ticks (to/from an integer register) and 4 ticks (to/from a floating-point register). D-cache remained dual-ported, though unlike in EV5 it wasn't of 2 equal synchronised parts, but of a single part clocked at double the core frequency. External B-cache of 1Mb to 16Mb, direct-mapped, write-back, utilised an independent 128-bit bidirectional data bus (with an additional 16-bit ECC protection), also an independent 20-bit unidirectional address bus. B-cache consisted of LW SSRAM chips (late write), later of DDR SSRAM units (double data rate). B-cache's speed was programmable from 2/3 to 1/8 of a core frequency. Unlike for the previous generations of Alpha processors, B-cache itself wasn't optional. The system data bus was only 64-bit wide with an additional 8-bit ECC protection, bidirectional, but utilised the DDR technique. The system address bus was 44-bit wide, implemented physically through two 15-bit unidirectional paths with no DDR support. The system control bus was 15-bit, also with no DDR support. The basic working principle of the system bus was changed, so the bus became dedicated instead of shared, thus every processor possessed an own path to a chipset.
 
The branch prediction logic was redesigned completely. It followed a 2-level scheme: with a local history table of 1024 records 10-bit each, and a local predictor of 1024 records 3-bit each, also with a global predictor of 4096 records 3-bit each, and a history path of 12 bits. Both the algorithms worked independently, and if the local one traced every branch available, the global one traced sequences of branches. The chooser analysed results of both the algorithms and made conclusions to a separate choice predictor of 4096 records 2-bit each, which was the source of a preferred decision if the predictions were different. Such a cooperative approach allowed to achieve better results than any of both of them if used stand-alone.
 
While engineering EV6, considering a large number of advanced functional units and other complications, the clock subsystem was redesigned entirely. A more efficient signal flow allowed the core to reach frequencies of the much simpler core of EV56 while involving almost the same technological process. Overall, power consumed by the clock subsystem of EV6 was about 32% of the total core power. To compare, for EV56 it was about 25%, for EV5 — about 37%, for EV4 — about 40%.

Clock driver placements for Alpha CPUs

 
EV6 was manufactured using the same technological process to of EV56, but with 2 additional metallisation layers. Consisted of 15.2 mln. transistors (including about 9 mln. spent for I-cache, D-cache and branch predictors), utilised a die size of 314mm² and required 2.1V to 2.3V supply. The core frequencies ranged from 466MHz to 600MHz (TDP approx. from 80W to 110W). Form-factor: PGA-587 (Pin Grid Array).

 
Micrograph of EV6 Floor-plan of EV6
EV6 (front) EV6 (back)

21264A (EV67) entered the market in the end of 1999. Produced by Samsung using a 0.25µ CMOS7 process, posessed a die size of 210mm² and required a lower supply of 2.0V. No architectural differences compared to EV6. The core frequencies ranged from 600MHz to 833MHz (TDP approx. from 70W to 100W), which allowed Alpha to bring back the leadership on integer tasks, lost not so much time ago to Intel Pentium III and AMD Athlon.
 
The first samples of 21264B (EV68C) were delivered in the beginning of 2000. It was produced by IBM using a 0.18µ CMOS8 process involving copper conductors. Despite absence of any architectural differences still, the promising technology allowed to increase core frequencies right up to 1250MHz. In 2001, Samsung was able to manufacture 21264B (EV68A) in series using a 0.18µ process of its own, but involving aluminium conductors, thus reducing the die size to 125mm² and voltage to 1.7V, which allowed to place core frequencies in between 750MHz and 940MHz (TDP approx. from 60W to 75W). It was declared in September of 1998 that EV68 from Samsung would be implemented in an innovative 0.18µ FD-SOI (Fully Depleted Silicon-On-Insulator) process involving copper conductors, which should allow EV68 to reach 1.5GHz and even more. Unfortunately, it didn't happen.

 
Prototype of EV68A (front) Prototype of EV68A (back)

Different sources mention 21264C and 21264D, code-named as EV68CB and EV68DC, manufactured by IBM using the same technology as EV68C and running within the same frequency range, so they could be considered as minor modifications. The only noticeable difference was a new form-factor, pinless CLGA-675 (Ceramic Land Grid Array) instead of PGA-587.
 
There were 2 chipsets designed initially for processors of 21264 series: DEC Tsunami (21272; also known as Typhoon) and AMD Irongate (AMD-751), though could be many more if to take into account that both 21264 and Athlon utilised almost the same system bus, which had been licenced by DEC to AMD.
 
DEC Tsunami was a highly scalable chipset. It could be used to design single-processor as well as dual-processor and quad-processor systems with the memory path from 128 to 512 bits wide (SDRAM ECC registered, 83MHz) and supporting from one to several PCI buses (64-bit, 33MHz). Such a flexibility was reached because of chipset separation for components: system bus controllers (C-chips, one per processor), memory bus controllers (D-chips, one per every 64 bits of the bus width) and PCI bus controllers (P-chips, one per bus needed). So, there is no wonder that some systems (for example, AlphaPC 264DP) were provided with chipsets consisting of 12 chips...
 
Although AMD Irongate (AMD-751) was developed to serve as a north bridge to Athlon-based mainboards, accompanied with the AMD Viper (AMD-756) south bridge or a compatible one, it was used in some Alpha mainboards (to be precise, in UP1000 and UP1100). Being a single-chip solution, it cost much less than DEC Tsunami and consumed much less energy. However, it wasn't the best solution for 21264, because lacked multi-processing support and had a narrow memory path (64-bit, SDRAM ECC unbuffered, 100MHz). Nevertheless, Irongate was the first chipset for Alpha to feature the AGP bus support.
 
In 2001, Samsung introduced the UP1500 mainboard, which was a single-processor solution designed upon the AMD Irongate-2 (AMD-761) north bridge. This mainboard was superior in means of performance to UP1000 and UP1100 due to support of a more advanced operating memory technology: Irongate-2 could utilise either up to 4Gb of DDR SDRAM ECC registered at 133MHz in 4 DIMMs with 2 RAS lines each or up to 2Gb of DDR SDRAM ECC unbuffered at the same 133MHz in 2 DIMMs with 2 RAS lines each. However, the memory path remained narrow (64-bit), and less expensive unbuffered ECC memory as well as non-ECC memory didn't seem to be supported by the firmware of UP1500.
 
back to the contents
 
 
Part 8. The Epoch of Compaq
 
In fact, Compaq purchased the remains of DEC because of significant assembling facilities, its wide distributional network (in 98 countries) and that cross-licensing agreement with Intel (for example, allowing to manufacture 8-processor Profusion servers). As it seemed to be, the division developing the Alpha architecture wasn't welcome really: Compaq produced workstations and servers based upon Intel's processors for a very long time and also paid a high attention to AMD's processors. So, in June of 1998, Compaq established an alliance with Samsung to develop the architecture (to mention, DEC and Samsung signed an agreement in February of 1998, which gave a full access to all Alpha-related patents to the latter, allowed to manufacture the Alpha processors developed already and even to design new ones on Samsung's own). A new company was incorporated mutually, API (Alpha Processor Inc.), to promote the architecture (some ones seemed to make right conclusions based upon DEC's history). In the summer of 1998, EV6-based systems entered mass production stage featuring the best price/performance ratios compared to other competing products available on the market. Serious problems with future Itanium from Intel were reasonable enough to conclude that the situation described would remain unchanged in the near future. Outside of Samsung, the EV6 processors were manufactured by Intel using its Fab-6 in Hudson, accordingly to the final agreement with late DEC...
 
Year 1999 was unsuccessful to Compaq because of falling sales on the market of personal computers. The most frequently named reason was an underestimation of possibilities given by the Internet to promote and sell PCs. Unlike Dell, which adapted its business model and offered computer equipment priced most attractively among all top brands. Compaq's CEO, Eckhard Pfeiffer, resigned after a financial disaster in the 1st quarter of 1999. Trying to reduce losses, Compaq started to minimise its presence in certain areas and that affected Alpha systems: in May of 1999, an assembling line of AlphaServers in Salem (New Hampshire) was announced to shut down soon.
 
On the 23rd of August 1999, a notorious event took place: Compaq announced to discontinue participation in development of Windows NT and stopped to supply this OS with Alpha systems of its own. In fact, it also laid off a team of about 120 programmers from former Western Research Laboratory of DEC (DECwest) working on this project. Accordingly to Compaq's statistics, among all preinstalled operating systems on new Alpha machines Tru64 held a share of 65%, OpenVMS — of 35% and Windows NT just about of 5%, so there was no reason to keep flogging a dead horse. A week after, Microsoft announced in return that there would be no Windows 2000 for Alpha released. Considering a fact that support of PowerPC and MIPS architectures was abandoned by Microsoft in 1997 together with Motorola and SGI respectively, the future of "the universal OS" was sentenced to be tied to a single architecture if to discount IA-64...
 
In December of 1999, Compaq and Samsung signed a memorandum to support the leadership of the Alpha architecture in the near future. Both the sides agreed to invest 500 mln. USD into the architecture (Samsung was obligated to spend 200 mln. USD into development and tuning of new technical processes, and Compaq was supposed to spend 300 mln. USD into design of new server solutions and further development of Tru64 UNIX). In addition, during the same month Compaq and IBM agreed that the latter would manufacture Alpha processors using a copper-conductor technology of its own upon completion. At the same time, Samsung was granted to remain the primary supplier of Alpha processors. The year passed not so good for Compaq illustrated well by a price per share delta: from 51 USD in February to 28 USD in December. Though many analysts stated it could be worse.
 
Y2K passed for Compaq quietly. Samsung wasn't able to tune its 0.18µ process unlike IBM, which started to supply EV68C to Compaq in limited quantities, and the market had to enjoy considerably slow EV67. The development of 21364 (EV7, also known as Marvel) was in progress still, though 21464 (EV8, also known as Araña) was mentioned here and there. The fall of dot-coms affected Compaq's shares, which dropped in price to 15 USD per share by December, i.e. for 44% since January. Could be strange, but that was a good result; other companies, more dependÁble upon e-commerce, lost much more: Gateway — 75%, Apple — 71%, Dell — 65%. Dot-coms themselves were either bankrupts or close to that; Yahoo.com lost 95% of its market value, Priceline.com — 97%.
 
In the beginning of 2001, Samsung started to manufacture EV68A in quantity, but the right moment had been missed already. Compaq planned to ship EV68C-based systems (GS-class AlphaServers) and to modernise those already in production. EV7 was still somewhere there when something happened not expected at all: on the 25th of June 2001 ("black Monday"), Compaq proclaimed to transfer all its server solutions from Alpha to IA-64 architecture by 2004. In fact, it meant a surrender to Intel and HP. EV8 was cancelled immediately, though some details about its internals were available at Microprocessor Forum in October of 1999, and EV7 was scheduled for release not earlier than the beginning of 2002. Afterwards, the Alpha Microprocessor Division had to be disbanded and most of its personnel should be employed by Intel. Samsung and IBM ceased producing Alpha processors soon. Later, the situation became even more interesting: on the 3rd of September 2001, Hewlett-Packard announced its intentions to acquire Compaq, which experienced certain financial difficulties and its price-per-share value was of 10 USD in December of 2001. The deal was approved by shareholders' meetings of both the corporations as well as by the governments of the USA and Canada and was finished in May of 2002.
 
On the 21st of October 2001, API (renamed by that moment to API NetWorks) transferred all rights to support (including warranty service) Alpha systems to Microway, the largest [after Compaq] builder of Alpha workstations and servers, an old partner of late DEC. API itself left the market of Alpha products and concentrated its efforts on network technologies, development of the HyperTransport bus, and data storage systems.
 
As a conclusion, it could be said that though Compaq didn't follow many of those mistakes made by DEC before, it didn't unveil all power of the architecture. High-performance Alpha systems based upon 21264A and 21264B didn't hit the price tag of 2000 USD, and low-cost 21264PC never appeared. A possibility of producing low-priced mainboards in volume using AMD Irongate was ignored, and pricey DEC Tsunami (offered by Compaq for over 1000 USD per set in OEM lots) left no chance to Alpha systems to enter the mid-range computer category. Other chipset-manufacturers for AMD Athlon didn't adapt them for 21264, though VIA had such an intention initially. The AMD Irongate-4 (AMD-762) north bridge, though had been available since 2001, never appeared in any mainboard design for Alpha. Irongate-4 supported 2-way multi-processing and the same memory interface to of Irongate-2, thus was superior to both Irongate and Irongate-2.
 
Although Compaq did manage Alpha to lose the workstation market. In fact, there were only two Alpha workstations produced by Compaq: XP900 (with a 466MHz EV6 and 2Mb of B-cache; code-named as Webbrick) and XP1000 (with a 500MHz EV6 and 4Mb of B-cache, later with a 666MHz EV67; code-named as Monet). They were based upon DEC Tsunami, though with a relatively narrow 128-bit memory data path. These machines failed in competition with x86 workstations, which were less powerful but also much less expensive. Eventually, their failure indicated the end of Windows NT on Alpha: servers ran Digital UNIX or OpenVMS mostly. This issue could and should be counted against Compaq. For the record, DEC fought for the workstation market desperately and even achieved some success. Compaq hadn't achieved anything on this playfield but lost everything quickly.
 
back to the contents
 
 
Part 9. EV7, EV79, EV7z, EV8
 
The first news about the architecture of 21364 (EV7) was from Microprocessor Forum in October of 1998. It said that the processor would be based upon the core of EV6 but with Direct Rambus DRAM controller (presumably, 4-channel) and a L2 cache (1.5Mb 6-way set associative), both integrated. It was also mentioned that no differences in the core of EV6 were planned, though could be another reason: no one could handle this hard task, because not so many chip design engineers were employed by Compaq. The design was expected to be completed by 2000.
 
HP inherited the Alpha architecture after the acquisition of Compaq, though didn't need that bequest in fact, because developed the 64-bit PA-RISC architecture (Precision Architecture RISC) on its own and held the alliance with Intel to develop the IA-64 architecture (i.e. Itanium). So, HP's actions regarding the Alpha architecture were limited to selling EV6/EV67/EV68-based servers inherited from Compaq and launching EV7 into production, which was presented finally in January of 2002.
 
Like expected, EV7 contained the core of EV68 (either non-modified at all or with minimal changes) and several units integrated additionally: two memory controllers (two Z-boxes, for Direct Rambus DRAM PC800), a multi-functional router (R-box, for multi-processor support and networking), and a full-speed L2 cache (S-cache, 1.75Mb 7-way set associative). The data path to S-cache was 128-bit wide and the cache itself worked with significant latencies (12 ticks while reading). Both Z-boxes and R-box were clocked at 2/3 the core frequency. Memory channels' speed depended upon Z-boxes and was 1/2 their frequency (1/3 the core frequency respectively), but utilised the DDR technology.
 
Every Z-box supported 5 memory channels (4 primary and 1 auxiliary), 18-bit wide each (16 for commands/data/addresses and 2 for ECC). The auxiliary channel was optional and could be used to organise a failure-tolerant memory array (roughly speaking, like RAID3). For example, when writing a quad-word (64 bits) to memory it was divided for 4 words (16 bits), each of them was sent through a dedicated channel, and the auxiliary one was used to store a checksum. In addition, every Z-box could held up to 1024 memory pages open. The total theoretical memory bandwidth of one EV7 was about 12Gb/s. Obviously, since every EV7 in a multi-processor system had a memory area of its own, such a memory model was called NUMA (Non-Uniform Memory Access), in contrary to traditional SMP (Symmetrical Multi-Processing), also known as UMA (Uniform Memory Access), which implied all processors installed to have access to a single (common) memory area. Thus, every processor in a system (128 maximum) could access memory through controllers of its own as well as through other processors' controllers. R-box carried a communicative function between processors, also between a particular processor and local peripherals. It supported 4 independent channels with a theoretical bandwidth of 6Gb/s each (one per every neighbourous processor connected), also 1 additional channel for high-speed input/output transfers.
 
EV7 processors could be connected to each other using various algorithms, but so-called "torus" and "shuffle" interconnects were choosen usually. In addition, the second one was more effective potentially in some situations (for example, considering 8-processor systems, "shuffle" allowed each processor to be connected straightforward to 4 others, when "torus" — to 3 others only; a good guess that for 12-processor and more powerful systems this difference vanished).

Processor interconnections inside an 8-way EV7 system

 
Was manufactured using a 7-layer 0.18µ CMOS8 process, consisted of 152 mln. transistors (including 137 mln. for I-cache, D-cache and S-cache) and therefore utilised a very large die size (397mm²). Prototypes were clocked at 1250MHz (TDP of 155W), though those processors installed in systems produced by HP were running at 1000MHz to 1150MHz. From the engineering point of view, EV7 couldn't stay on par with the previous representatives of the Alpha architecture considering density of functional units placed on a die, and that drawback affected the maximal core frequencies reachable, S-cache's latencies, and, in turn, performance.

Floor-plan of EV7

 
In December of 2002, HP let go out a press-release saying that first EV7-based servers would be available in January of 2003. Later, EV79 ought to be produced (using a 0.13µ SOI process), and there were no further Alpha processors planned. In March of 2003, a prototype of EV79 was observed at ISSCC with a die size of 251mm², requiring 1.2V supply, and clocked at 1450MHz (TDP of 100W). However, in October of 2003 a news about manufacturing problems sneaked out of IBM, and half a year after the processor was cancelled finally.
 
In August of 2004, the last Alpha processor was announced, EV7z. It was clocked at 1300MHz and was manufactured using the same 0.18µ process. Like EV7, it was decided to be installed into HP's products only. There was also mentioned that servers and workstations of the Alpha architecture were subjects for sale until 2006 and for support until 2011, but no longer.
 
21464 (EV8), a cancelled one, was supposed to be the successor to EV7, with the number of primary functional units doubled (8 integer and 4 floating-point pipelines), and with 3Mb of S-cache. A new technology, SMT (Simultaneous Multi-Threading), should also be implemented, which meant a concurrent execution of up to 4 software streams inside of a single physical core (presumably, this technology was related somehow to HyperThreading by Intel). The die size was estimated at 420mm² for 250 mln. transistors under a 0.13µ SOI process. The initial implementation was expected to run at 1.8GHz with 1.1V core voltage (TDP of 150W).
 
back to the contents
 
 
Epilogue
 
At the moment of writing (April of 2005), Alpha systems were offered still, mostly through HP and Microway. The latter even listed relatively inexpensive workstations based upon 21164A and AlphaPC 164LX for Linux (2000 USD for the standard configuration). Many retired, but still working workstations and servers, as well as their parts, were offered through "online flea markets". Most of those systems were working under Windows NT, and many of them would not accept neither Digital UNIX nor OpenVMS, and some even *BSD (systems with no SRM console available), though it could be still possible to install Linux under ARC/AlphaBIOS. If you have some intention to purchase an Alpha system, clarify this question before giving money unless you feel having a lack of problems.
 
Accordingly to the statistics, DEC and Compaq sold about 800 thousand Alpha workstations and servers until June of 2001. There is no exact number how many systems have been assembled and sold by others, but estimated to be over 500 thousand.
 
Many people say that the Alpha architecture has died on its own. Hope after passing through this article you will have no doubts that it has been buried. Alive. Because it has been more profitable to do so.
 
There were many cases in history when a poorly crafted product prevailed over a better one. Maybe, the first product cost much less than the second one. Also possible, the second product was promoted passively too much. Maybe, licence fees were incomparable. Everything could be possible. Some would admit that marketing boys and girls promoting some goods and understanding well their poor functionality, exert themselves to the utmost extent while realising clearly that their next salary could be the last paid off otherwise.
 
Life goes on...
 
back to the contents
 

Powered by Digital

 
 
Additional Information
 
Here are Alpha-related press-releases and announcements by DEC, Compaq, Samsung asf. They have been used while writing this article, so the author assumes they may be interesting to the readers. In fact, they are history now, thus hard-to-find elsewhere. The documents are listed in chronological order. If anyone has additional ones, the author would appreciate getting their copies.
 
1. Digital Workstations Set New Mark for Speed, Price/Performance in Open Client/Server Computing (21-July-1994)
 
2. Transcript of HOTCHIPS VI Presentation of the 21164 Microprocessor (18-August-1994)
 
3. Digital Again Extends Performance Leadership with New Generation of Alpha AXP Microprocesors (7-September-1994)
 
4. Digital Microprocessor Posts World Record (7-September-1994)
 
5. Digital's New Alpha 21066A Chip with PCI Puts More Speed into Embedded Applications, Desktop PCs (14-November-1994)
 
6. Digital Extends Alpha Performance Lead with Speed Upgrades to Alpha Microprocessors (2-November-1995)
 
7. Newest Alpha Microprocessor Hits 500MHz, Alpha Tops for Windows NT Visual Computing (8-July-1996)
 
8. The 21264: A Superscalar Alpha Processor with Out-of-Order Execution (24-October-1996)
 
9. Alpha Is Launched into the Volume Windows NT PC Market with Low Cost 21164PC Microprocessor (17-March-1997)
 
10. Alpha 21164 Microprocessor Streaks to 500MHz, Sets New Industry Performance (31-March-1997)
 
11. Digital and Intel Announce Long-Term Agreement to Expand Relationship; Move to Settle Litigation (27-October-1997)
 
12. Compaq to Acquire Digital for $9.6 Billion (26-January-1998)
 
13. Alpha Roadmap Shows New Destinations for `98 (26-January-1998)
 
14. Digital to Break 1,000MHz Barrier with High-Powered New Generation of Alpha Architecture (2-February-1998)
 
15. Digital to Grant Samsung Architectural License for Alpha Technology (9-February-1998)
 
16. Samsung Introduces Alpha 21264, World's Fastest Microprocessor — Sets Standards for 64-bit Visual and Enterprise Computing (6-April-1998)
 
17. Samsung Targets High Performance Entry-Level Windows NT Servers; New Low-Cost Platforms Based on 500MHz to 633MHz Alpha Processors (6-April-1998)
 
18. Digital and Intel Complete Sale of Digital Semiconductor Manufacturing Operations (18-May-1998)
 
19. Alpha Processor, Inc. Debuts to Drive 64-Bit Alpha in High-Volume NT Markets (16-June-1998)
 
20. Samsung Electronics Develops World's First Next-Generation Wafer Processing Technology (14-September-1998)
 
21. Samsung Electronics Develops 0.18um Process Technology for 1GHz CPU (23-October-1998)
 
22. Compaq, API, and Samsung Set Long-Term Growth Strategy for Alpha (13-December-1999)
 
23. Compaq and Intel to Accelerate Enterprise Server Roadmaps (25-June-2001)
 
24. Compaq Unveils the AlphaServer ES45, Industry's Most Powerful Mid-Range Server (16-October-2001)
 
25. Microway Named Master Distributor and Exclusive Service Provider for API NetWorks' Alpha-Based Product Line (24-October-2001)
 
26. HP Introduces Most Powerful Generation of AlphaServer Systems (20-January-2003)
 
 
Literature
 
1. Rich Witek, Dick Sites. Alpha Architecture Technical Summary, 1992.
 
2. Richard L. Sites. Alpha AXP Architecture, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.
 
3. Daniel W. Dobberpuhl, and others. A 200-MHz 64-bit Dual-issue CMOS Microprocessor, Digital Technical Journal, Vol. 4, No. 4, Special Issue, 1992.
 
4. Edward McLellan. The Alpha AXP Architecture and 21064 Processor, IEEE Micro, 1993.
 
5. Dina L. McKinney, and others. Digital's DECchip 21066: The First Cost-focused Alpha AXP chip, Digital Technical Journal, 1994.
 
6. Robert Couranz. The E2COTS System and Alpha AXP Technology: The New Computer Standard for Military Use, Digital Technical Journal, Vol. 6, No. 2, 1994.
 
7. Samyojita A. Nadkarni, and others. Development of Digital's PCI Chip Sets and Evaluation Kit for the DECchip 21064 Microprocessor, Digital Technical Journal, Vol. 6, No. 2, 1994.
 
8. Linley Gwennap. Digital Leads the Pack with 21164, Microprocessor Report, Vol. 8, No. 12, 1994.
 
9. William J. Bowhill, and others. Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU, Digital Technical Journal, Vol. 7, No. 1, 1995.
 
10. David P. Hunter, Eric B. Betts. Measured Effects of Adding Byte and Word Instructions to the Alpha Architecture, Digital Technical Journal, Vol. 8, No. 4, 1996.
 
11. Linley Gwennap. Digital, MIPS Add Multimedia Extensions, Microprocessor Report, Vol. 10, No. 15, 1996.
 
12. Daniel Leibholz, Rahul Razdan. The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor, Proceedings of of IEEE COMPCON'97, 1997.
 
13. Michael K. Gowan, Larry L. Biro, Daniel B. Jackson. Power Considerations in the Design of the Alpha 21264 Microprocessor, DAC 98, June 15-19, 1998.
 
14. Linley Gwennap. Compaq, Intel Fight Digital Brain Drain, Microprocesor Report, Vol. 12, No. 14, October 26, 1998.
 
15. Linley Gwennap. Alpha 21364 to Ease Memory Bottleneck, Microprocessor Report, Vol. 12, No. 14, October 26, 1998.
 
16. M. Matson, and others. Circuit Implementation of a 600 MHz Superscalar RISC Microprocessor, Compaq Technology Journal, 1998.
 
17. Chart Watch: Workstation Processors, Microprocessor Report, May 10, 1999.
 
18. Daniel W. Bailey. High-Performance Alpha Microprocessor Design, Compaq Computer Corporation, 1999.
 
19. Exploring Alpha Power for Technical Computing, Compaq Technology Brief, April 2000.
 
20. Zarka Cvetanovic. Performance Analysis of the Alpha 21364-based HP GS1280 Multiprocessor, Hewlett-Packard Corporation, 2002.
 
21. Kevin Krewell. Alpha EV7 Processor: A High-Performance Tradition Continues, Microprocessor Report, April 5, 2002.
 
22. Ronald P. Preston. Design of an 8-wide Superscalar RISC Microprocessor with Simultaneous Multihreading, Compaq Computer Corporation, ISSCC Report, 2002.
 
23. Peter N. Glaskowsky. Moore, Moore, and More at ISSCC, Microprocessor Report, March 23, 2003.
 
Many technical documents by DEC and Compaq have been used; this article wouldn't be complete without them.
 
The author pays a credit to Wikipedia for information about DEC's early history as well as products of those old days, also to Terry Shannon for his regular and informative newsletter "Shannon Knows {DEC, Compaq, HPC}"
 
This paper contains information collected from many unofficial Internet-resources, the full list of which is long too much to be placed below; a big and sincere appreciation goes to all their authors, for especially interesting facts, comments, points of view asf.
 
The photographs of EV4 and EV6 are a courtesy of cpu-collector.com
 
A special credit for extremely useful notes and suggestions made while preparing this article goes to ISA_user, VLev, Yury_Malich, Stranger_NN, and of course, to matik! (all are from forum.radeon.ru)

Copyright (c) Paul V. Bolotoff, 2005-2006. All rights reserved.
A full or partial reprint without a permission from the author is prohibited.