Worksheet # 2 Draft Solution (Cont'd)

3.2 The execution profile without bypassing and forwarding is as follows:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

2

 

F

S

S

S

I

X

M

W

 

 

 

 

 

 

 

 

3

 

 

 

 

 

F

S

S

S

I

X

M

W

 

 

 

 

4

 

 

 

 

 

 

 

 

 

F

S

S

S

I

X

M

W

5

 

 

 

 

 

 

 

 

 

 

 

 

 

F

S

S

S

 

 

 

Clock Cycle #

 

 

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

Instruction #

5

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

 

6

F

S

S

S

I

X

M

W

 

 

 

 

 

 

 

 

 

7

 

 

 

 

F

I

X

M

W

 

 

 

 

 

 

 

 

8

 

 

 

 

 

F

S

S

S

I

X

M

W

 

 

 

 

9

 

 

 

 

 

 

 

 

 

F

S

S

S

I

X

M

W

10

 

 

 

 

 

 

 

 

 

 

 

 

 

F

I

X

M

11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F

S

S

 

 

 

Clock Cycle #

 

 

35

36

37

38

39

I #

10

W

 

 

 

 

11

S

I

X

M

W

 

The execution profile with bypassing and forwarding is as follows:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

2

 

F

I

S

X

M

W

 

 

 

 

 

 

 

 

 

 

3

 

 

F

S

S

I

X

M

W

 

 

 

 

 

 

 

 

4

 

 

 

 

 

S

F

I

X

M

W

 

 

 

 

 

 

5

 

 

 

 

 

 

 

S

F

I

X

M

W

 

 

 

 

6

 

 

 

 

 

 

 

 

 

S

F

I

X

M

W

 

 

7

 

 

 

 

 

 

 

 

 

 

 

S

F

I

X

M

W

8

 

 

 

 

 

 

 

 

 

 

 

 

 

S

F

I

X

9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S

F

 

 

 

 

 

Clock Cycle #

 

 

18

19

20

21

22

23

24

25

Instr. #

8

M

W

 

 

 

 

 

 

9

I

X

M

W

 

 

 

 

10

S

F

I

X

M

W

 

 

11

 

 

S

F

I

X

M

W

 

 

 

4.1 The execution profile without bypassing and forwarding is as follows:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

2

 

F

S

S

I

X

M

W

 

 

 

 

 

 

 

 

 

3

 

 

 

 

F

S

S

I

X

M

W

 

 

 

 

 

 

4

 

 

 

 

 

 

 

F

S

S

I

X

M

W

 

 

 

5

 

 

 

 

 

 

 

 

 

 

F

S

S

I

X

M

W

6

 

 

 

 

 

 

 

 

 

 

 

 

 

F

S

S

I

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F

 

 

 

Clock Cycle #

 

 

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Instruction #

6

X

M

W

 

 

 

 

 

 

 

 

 

 

 

7

I

X

M

W

 

 

 

 

 

 

 

 

 

 

8

F

S

S

I

X

M

W

 

 

 

 

 

 

 

9

 

 

 

F

S

S

I

X

M

W

 

 

 

 

10

 

 

 

 

 

 

F

I

X

M

W

 

 

 

11

 

 

 

 

 

 

 

F

S

S

I

X

M

W

 

 

The execution profile with bypassing and forwarding is as follows:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

F

I

S

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

3

 

 

F

S

I

S

X

M

W

 

 

 

 

 

 

 

 

 

 

4

 

 

 

 

F

S

I

X

M

W

 

 

 

 

 

 

 

 

 

5

 

 

 

 

 

 

F

I

X

M

W

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

F

I

X

M

W

 

 

 

 

 

 

 

7

 

 

 

 

 

 

 

 

F

I

X

M

W

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

 

F

I

S

X

M

W

 

 

 

 

9

 

 

 

 

 

 

 

 

 

 

F

S

I

X

M

W

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

F

I

X

M

W

 

 

11

 

 

 

 

 

 

 

 

 

 

 

 

 

F

I

S

X

M

W

 

 

 

4.2 The original code fragment was:

 

LD R1, 100(R1);

 

LD R2, 200(R1);

 

DADDI R1, R2, #3000;

 

DSUB R2, R4, R1;

That instruction stores a value in R2 that will be destroyed by the next instruction. So, it can be removed.

AND R2, R0, R2;

That instruction puts zero in R2. We can use R0 instead. So, it can be removed.

SD R2, 400(R2);

It can changed to SD R0, 400(R0)

LD R3, 100(R0);

That instruction loads a value to R3 that will be destroyed by the next instruction. So, it can be removed.

XOR R3, R3, R3;

That instruction puts zero in R3. We can use R0 instead. So, it can be removed.

SD R3, 0(R3);

It can be changed to SD R0, 0(R0).

LD R3, 0(R2);

Actually, that loads R3 with zero. We can use R0 instead. So, it can be removed.

SD R3, 100(R3);

It can be changed to DS R0, 100(R0).

 

The optimized code is:

 

LD R1, 100(R1);

LD R2, 200(R1);

DADDI R1, R2, #3000;

SD R0, 400(R0);

SD R0, 0(R0);

SD R0, 100(R0);

 

Working problem 3.2 on the new code without bypassing and forwarding, we obtain:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

F

S

S

S

I

X

M

W

 

 

 

 

 

 

 

 

 

 

3

 

 

 

 

 

F

S

S

S

I

X

M

W

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

 

F

I

X

M

W

 

 

 

 

 

5

 

 

 

 

 

 

 

 

 

 

F

I

X

M

W

 

 

 

 

6

 

 

 

 

 

 

 

 

 

 

 

S

S

S

F

I

X

M

W

 

 

Working problem 3.2 on the new code with bypassing and forwarding, we obtain:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

 

2

 

F

I

S

X

M

W

 

 

 

 

 

 

 

 

3

 

 

F

S

S

I

X

M

W

 

 

 

 

 

 

4

 

 

 

 

 

S

F

I

X

M

W

 

 

 

 

5

 

 

 

 

 

 

 

S

F

I

X

M

W

 

 

6

 

 

 

 

 

 

 

 

 

S

F

I

X

M

W

 

 

 

Working problem 4.1 on the new code without bypassing and forwarding, we obtain:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

 

 

2

 

F

S

S

I

X

M

W

 

 

 

 

 

 

3

 

 

 

 

F

S

S

I

X

M

W

 

 

 

4

 

 

 

 

 

 

 

F

I

X

M

W

 

 

5

 

 

 

 

 

 

 

 

F

I

X

M

W

 

6

 

 

 

 

 

 

 

 

 

F

I

X

M

W

 

 

Working problem 4.1 on the new code with bypassing and forwarding, we obtain:

 

 

Clock Cycle #

 

 

1

2

3

4

5

6

7

8

9

10

11

12

Instruction #

1

F

I

X

M

W

 

 

 

 

 

 

 

2

 

F

I

S

X

M

W

 

 

 

 

 

3

 

 

F

S

I

S

X

M

W

 

 

 

4

 

 

 

 

F

S

I

X

M

W

 

 

5

 

 

 

 

 

 

F

I

X

M

W

 

6

 

 

 

 

 

 

 

F

I

X

M

W

 

 

 

4.3

 

For problem 3.2 without bypassing:

 

For problem 3.2 with bypassing:

 

For problem 4.1 without bypassing:

 

For problem 4.1 with bypassing:

 

Web Accessibility