Datapath: performance

	Average instruction time could be less if we didn't have to use load instruction
		to determine cycle time
	But that's not actually the longest instruction time!
	What about multiply and divide, or even floating-point operations?
		Need separate, slower ALUs
	Assume: 8 ns for floating-point add, 16 ns for floating-point multiply
		Instruction		Inst	Reg	ALU	Data	Reg	Total	Distribution
		type		mem	read	op	mem	write
		R-type		2	1	2	0	1	6	27%	1.6200
		load		2	1	2	2	1	8	31%	2.4800
		store		2	1	2	2	0	7	21%	1.4700
		branch		2	1	2	0	0	5	5%	0.2500
		jump		2	0	0	0	0	2	2%	0.0400
		FP add		2	1	8	0	1	12	7%	0.8400
		FP multiply		2	1	16	0	1	20	7%	1.4000
											8.1
		Average is 8.1 ns, which is only 8.1/20 = 41% of the single-cycle time
		In other words, variable cycle could be about 2.5 times faster!

	Variable clock cycle is impractical, but we can break up the instructions into
		shorter cycles, then only use the parts needed for any given instruction