CMSC 411
Computer Systems Architecture
Lecture 2
Trends in Technology

Moore’s Law: 2X transistors / “year”

- “Cramming More Components onto Integrated Circuits”
  - Gordon Moore, Electronics, 1965
- # on transistors / cost-effective integrated circuit double every N months (12 ≤ N ≤ 24)
Tracking Technology Performance Trends

• Drill down into 4 technologies:
  – Disks,
  – Memory,
  – Network,
  – Processors

• Compare ~1980 Archaic (Nostalgic) vs. ~2000 Modern (Newfangled)
  – Performance Milestones in each technology

• Compare for Bandwidth vs. Latency improvements in performance over time

• Bandwidth: number of events per unit time
  – E.g., Mbits / second over network, Mbytes / second from disk

• Latency: elapsed time for a single event
  – E.g., one-way network delay in microseconds, average disk access time in milliseconds

Disks: Archaic(Nostalgic) v. Modern(Newfangled)

<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>CDC Wren I, 1983</td>
<td>Seagate 373453, 2003</td>
</tr>
<tr>
<td>3600 RPM</td>
<td>15000 RPM (4X)</td>
</tr>
<tr>
<td>0.03 GBytes capacity</td>
<td>73.4 GBytes (2500X)</td>
</tr>
<tr>
<td>Tracks/Inch: 800</td>
<td>Tracks/Inch: 64000 (80X)</td>
</tr>
<tr>
<td>Bits/Inch: 9550</td>
<td>Bits/Inch: 533,000 (60X)</td>
</tr>
<tr>
<td>Three 5.25” platters</td>
<td>Four 2.5” platters (in 3.5” form factor)</td>
</tr>
<tr>
<td>Bandwidth: 0.6 MBytes/sec</td>
<td>Bandwidth: 86 MBytes/sec (140X)</td>
</tr>
<tr>
<td>Latency: 48.3 ms</td>
<td>Latency: 5.7 ms (8X)</td>
</tr>
<tr>
<td>Cache: none</td>
<td>Cache: 8 MBytes</td>
</tr>
</tbody>
</table>
Latency Lags Bandwidth (for last ~20 years)

- **Performance Milestones**
- **Disk**: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)
  (latency = simple operation w/o contention
  BW = best-case)

---

Memory: Archaic (Nostalgic) v. Modern (Newfangled)

- 1980 DRAM (asynchronous)
  - 0.06 Mbits/chip
  - 64,000 xtors, 35 mm²
  - 16-bit data bus per module, 16 pins/chip
  - 13 Mbytes/sec
  - Latency: 225 ns
  - (no block transfer)
- 2000 Double Data Rate Synchr. (clocked) DRAM
  - 256.00 Mbits/chip (4000X)
  - 256,000,000 xtors, 204 mm²
  - 64-bit data bus per DIMM, 66 pins/chip (4X)
  - 1600 Mbytes/sec (120X)
  - Latency: 52 ns (4X)
  - Block transfers (page mode)
LANs: Archaic (Nostalgic)v. Modern (Newfangled)

- Ethernet 802.3
- Year of Standard: 1978
- 10 Mbits/s link speed
- Latency: 3000 µsec
- Shared media
- Coaxial cable

- Ethernet 802.3ae
- Year of Standard: 2003
- 10,000 Mbits/s (1000X) link speed
- Latency: 190 µsec (15X)
- Switched media
- Category 5 copper wire

"Cat 5" is 4 twisted pairs in bundle.

Twisted Pair:

Copper, 1mm thick, twisted to avoid antenna effect.
Latency Lags Bandwidth (last ~20 years)

- Performance Milestones
  - Ethernet: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x, 1000x)
  - Memory Module: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x, 120x)
  - Disk: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)

(latency = simple operation w/o contention
BW = best-case)

CPUs: Archaic (Nostalgic) v. Modern (Newfangled)

- 1982 Intel 80286
- 12.5 MHz
- 2 MIPS (peak)
- Latency 320 ns
- 134,000 xtors, 47 mm²
- 16-bit data bus, 68 pins
- Microcode interpreter, separate FPU chip
- (no caches)

- 2001 Intel Pentium 4
- 1500 MHz(120X)
- 4500 MIPS (peak) (2250X)
- Latency 15 ns (20X)
- 42,000,000 xtors, 217 mm²
- 64-bit data bus, 423 pins
- 3-way superscalar, Dynamic translate to RISC, Superpipelined (22 stage), Out-of-Order execution
- On-chip 8KB Data caches, 96KB Instr. Trace cache, 256KB L2 cache
Latency Lags Bandwidth (last ~20 years)

- **Performance Milestones**
- **Processor**: ‘286, ‘386, ‘486, Pentium, Pentium Pro, Pentium 4 (21x, 2250x)
- **Ethernet**: 10Mb, 100Mb, 1000Mb, 10000 Mb/s (16x, 1000x)
- **Memory Module**: 16bit plain DRAM, Page Mode DRAM, 32b, 64b, SDRAM, DDR SDRAM (4x, 120x)
- **Disk**: 3600, 5400, 7200, 10000, 15000 RPM (8x, 143x)

**Rule of Thumb for Latency Lagging BW**

- **In the time that bandwidth doubles, latency improves by no more than a factor of 1.2 to 1.4**
  (and capacity improves faster than bandwidth)
- **Stated alternatively:**
  Bandwidth improves by more than the square of the improvement in Latency
6 Reasons Latency Lags Bandwidth

1. Moore’s Law helps BW more than latency
   - Faster transistors, more transistors, more pins help Bandwidth
     - MPU Transistors: 0.130 vs. 42 M xtors (300X)
     - DRAM Transistors: 0.064 vs. 256 M xtors (4000X)
     - MPU Pins: 68 vs. 423 pins (6X)
     - DRAM Pins: 16 vs. 66 pins (4X)
   - Smaller, faster transistors but communicate over (relatively) longer wires: limits latency
     - Feature size: 1.5 to 3 vs. 0.18 micron (8X,17X)
     - MPU Die Size: 35 vs. 204 mm² (ratio sqrt ⇒ 2X)
     - DRAM Die Size: 47 vs. 217 mm² (ratio sqrt ⇒ 2X)

2. Distance limits latency
   - Size of DRAM block ⇒ long bit and word lines ⇒ most of DRAM access time
   - Speed of light and computers on network

3. Bandwidth easier to sell (“bigger=better”)
   - E.g., 10 Gbits/s Ethernet (“10 Gig”) vs. 10 μsec latency Ethernet
   - 4400 MB/s DIMM (“PC4400”) vs. 50 ns latency
   - Even if just marketing, customers now trained
   - Since bandwidth sells, more resources thrown at bandwidth, which further tips the balance
6 Reasons Latency Lags Bandwidth (cont’d)

4. Latency helps BW, but not vice versa
   • Spinning disk faster improves both bandwidth and rotational latency
     » 3600 RPM ⇒ 15000 RPM = 4.2X
     » Average rotational latency: 8.3 ms ⇒ 2.0 ms
     » Things being equal, also helps BW by 4.2X
   • Lower DRAM latency ⇒ More access/second (higher bandwidth)
   • Higher linear density helps disk BW (and capacity), but not disk Latency
     » 9,550 BPI ⇒ 533,000 BPI ⇒ 60X in BW

5. Bandwidth hurts latency
   • Queues help Bandwidth, hurt Latency (Queuing Theory)
   • Adding chips to widen a memory module increases Bandwidth but higher fan-out on address lines may increase Latency

6. Operating System overhead hurts Latency more than Bandwidth
   • Long messages amortize overhead; overhead bigger part of short messages
Summary of Technology Trends

• For disk, LAN, memory, and microprocessor, bandwidth improves by square of latency improvement
  – In the time that bandwidth doubles, latency improves by no more than 1.2X to 1.4X

• Lag probably even larger in real systems, as bandwidth gains multiplied by replicated components
  – Multiple processors in a cluster or even in a chip
  – Multiple disks in a disk array
  – Multiple memory modules in a large memory
  – Simultaneous communication in switched LAN

• HW and SW developers should innovate assuming Latency Lags Bandwidth
  – If everything improves at the same rate, then nothing really changes
  – When rates vary, require real innovation

TRENDS IN SILICON COSTS
Costs

- The cost of components in a $1000 PC in 2001 are:
  - CPU – 22%
  - Monitor – 19%
  - Hard drive – only 9%
  - DRAM – only 5% (for 128MB)
  - Software – 20% (OS & basic office suite)

Manufacture of DRAM and other chips

- Chips are manufactured on wafers - circular disks containing many dies (chips).
- The wafer is tested and chopped into dies.
Wafers and dies

- To find the cost of a die:
  - Number of dies per wafer is at most the area of the wafer divided by the area of the die.
  - The cost of the wafer divided by the number of working dies per wafer is the cost of each die.
- The fraction of working dies is called the die yield, which decreases as the area of the die increases.
- Rule of thumb: Cost of die is proportional to the square of the die area