From: neideck@kar.dec.com (Burkhard Neidecker-Lutz) Newsgroups: comp.arch Subject: Re: Alpha 21264 info? Date: 24 Oct 1996 17:44:07 GMT The title of the talk: The 21264: A Superscalar Alpha Processor with Out-of-Order Execution So, well, Alpha has finally joined the brainiac club. The trick is, we didn't sacrifice the trademark clock speed while doing so. And while we were at it, we fixed a couple of nuisances that occasionally give us surprises with the older Alpha implementations. The marketing highlights: Estimated SPECint95 of 30+, SPECfp95 of 50+ Much better cache and memory system 500 Mhz+ operation in 0.35 um process 4-way out-of-order execution MPEG2 @ MLP *encode* in real time Now the details. Physical: Same 0.35 CMOS process as used for the 500 Mhz 21164, but two additional metal layers for power distribution (so it's a 6-layer metal process). Die size approx. 300 mm square, 15.2 million transistors. Speed bins starting at 500 Mhz, power is 60 watts @ 500 Mhz. 588 Pin Grid Array Package. Logical: 64 KByte 2-way setassociative instruction cache 64 KByte 2-way setassociative data cache 4 Integer Units (2 of which are also load-store units) 2 Floating Point Units 7 Stage Integer Pipeline 10 Stage Floating Point Pipeline Branching: Next line predictor (allows branches without fetch bubbles) (allows dynamic prediciton of computed jumps) Set predictor (allows 2-way associativity at high speed) Two level branch predictor (run a 2-bit traditional counter predictor and a global pattern detecting branch predictor in parallel and dynamically pick the one whose right more often) Branch predictor about twice as good as the one in the 21164 Out-of-Order execution: 80 physical integer registers - 32 architectural - 8 PAL-code shadow - 40 rename registers 72 physical floating point registers - 32 architectural - 40 rename registers 20 entry integer queue, quad-issue 15 entry floating point queue, dual-issue Out-of-Order mapper is a 500K transistor structure and is one of the critical pathes in the chip. 80 entry CAM for mapping up to 80 instructions in flight. Backing out to any state takes 1 cycle. Integer units: 4 units: add/logic/motion-video/shift/branch add/logic/multiply/shift/branch add/logic/memory add/logic/memory In order to get that many register ports, this is implemented as two identical copies of an 80 register file with two units attaching to each copy. The two register files are kept identical with a 1-cycle delay between clusters. Floating point units: add/div/square root multiply 4 cycle latency, fully pipelined. Divide is not pipelined, retires 6 bits/cycle (compared to 2 bits/cycle in the 21164). The new SQRT retires 2 bits/cycle (and also isn't pipelined). Data Cache, load-store reorder buffers: 2 loads/stores per cycle, any combination implemented as a single ported 1 Ghz cache... 32 entry load reorder buffer 32 entry store reorder buffer Stores check load buffer to enforce ordering Fine grain cache control through cache prefetch instructions Board level cache: L1 Dcache 8+ Gbyte/sec. sustained, 3 cycle load-to-use (like 21064) L2 cache 4+ GByte/sec. sustained, 128 bit separate port, 12 cycles load-to-use Board level cache can be built in 4 ways from 3 types of SRAM: 1. No board level cache 2. 133 Mhz Klamath-type Burst-RAM, 2.1 Gbyte/sec. bandwidth 3. 250 Mhz Late-write SSRAM, 4.0 Gbyte/sec. bandwidth 4. 333 Mhz Dual-data clock forwarding FSRAM, 5.3 GByte/sec. bw The board level cache can be 0, 1 ,2, 4, 8 or 16 Mbyte in size. Memory System: System Interface 2+ GByte/sec. sustained, 64 bit separate port, 80 cycles load-to-use (with Tsunami desktop chip set). 16 outstanding memory references, 64 bytes each: - 8 reads - 8 writes With Tsunami system chip set and SDRAMs, effective McCalpin STREAM bandwidth is 1.6 Gbyte/sec. Availability: Samples Q1/97 Volume H2/97 So, it's vapor right now, but if you want to sell vapor in 1997 you better had damn fast vapor then... Burkhard Neidecker-Lutz EUROMEDIA - Distributed Multimedia Archives for Cooperative TV Production CEC Karlsruhe , European Applied Research Center, Digital Equip. Corp. email: neideck@kar.dec.com AlphaStation 500/500: SPECint95 15.0, SPECfp95 20.4