### High Performance Computing Systems (CMSC714)





### Lecture 10: Shared Memory Architectures

### Abhinav Bhatele, Department of Computer Science



## Announcements

- Assignment 2 due on March 8
- Project description due on March II



### Abhinav Bhatele (CMSC714)

## Summary of last lecture

- Single node architecture is fairly complex
  - Two product lines: fast processors, low frequency low power processors
- IBM Blue Gene/Q Compute Chip
- Accelerators: IBM Cell BE, AMD APUs, NVIDIA GPGPUs, Intel XE



Abhinav Bhatele (CMSC714)



## Shared memory in hardware

- Cache coherent globally addressable memory
- Older machines had bus-based symmetric multiprocessing
- Origin was a different architecture: distributed shared memory with cache coherence

http://csweb.cs.wfu.edu/~torgerse/Kokua/SGI/007-3439-002/sgi html/ch01.html





# SGI Origin 2000

- Up to 512 nodes: 2 processors per node, 4 GB of memory
- Cache coherence maintained via a directory-based protocol
- Distributed directory that keeps track of each data block (page)
  - Implemented in hardware
  - Supports moving entire pages across nodes





## Hypercube network

- General topology: k-ary n-cube networks
- Hypercube: k=2





Abhinav Bhatele (CMSC714)

## SGI Altix 3000

- Based on Intel Itanium 2 processors and Linux
- 4 processors and up to 32 GB of memory





Abhinav Bhatele (CMSC714)



### Fat-tree network

|  | Two Cables<br>per Line |
|--|------------------------|
|  | Level 1<br>Routers     |
|  |                        |
|  | Level 1<br>Routers     |
|  | Two Cables<br>per Line |
|  |                        |
|  |                        |



512-Processor Altix 3000 400MB/sec/p Dual-Plane Bisecton Bandwidth



Abhinav Bhatele (CMSC714)





## Partitioned global address space (PGAS)

- Another parallel programming model
- Globally addressable view of memory to the programmer
- Notable examples:
  - SUN's Fortress, IBM's X10, Cray's Chapel
  - Unified Parallel C (UPC), Coarray Fortran (CAF), Global Arrays (GA)



Abhinav Bhatele (CMSC714)



## **Global Arrays**

- Developed at PNNL
- CSE applications using it: NWChem, GAMESS-UK, Chimera
- Can only be used for arrays

https://www.osc.edu/sites/osc.edu/files/staff\_files/dhudak/ga-oscll.pdf



### Physically distributed data





|  |  | 7 |
|--|--|---|
|  |  |   |
|  |  |   |
|  |  |   |
|  |  |   |
|  |  | 1 |
|  |  |   |

|   | - |   |
|---|---|---|
| - |   | 1 |
|   |   |   |

**Global Address Space** 

Abhinav Bhatele (CMSC714)





## Get-compute-put model





Abhinav Bhatele (CMSC714)



# UNIVERSITY OF MARYLAND

### Questions?



**Abhinav Bhatele** 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu