| Memory: performance | |||||||||||||
| Memory hierarchy can have important effect on performance | |||||||||||||
| Inner loop of matrix multiply: | |||||||||||||
| for (i = 0; i < 500; i++) | |||||||||||||
| for (j = 0; j < 500; j++) | |||||||||||||
| for (k = 0; k < 500; k++) | |||||||||||||
| x[i][j] = x[i][j] + y[i][k] * z[k][j]; | |||||||||||||
| Running time on Silicon Graphics system with MIPS R4000 processor | |||||||||||||
| and 1MB secondary cache: 77.2 seconds | |||||||||||||
| If loop order reversed so i is innermost: 44.2 seconds | |||||||||||||
| Only difference: order of accessing data | |||||||||||||
| Other compiler optimizations: less than 10 seconds! | |||||||||||||