| Score: | Name: |
|--------|-------|
|        |       |

## ECE 3055 Quiz 5 Wednesday, September 23

The program below is executed on the 5 stage pipelined MIPS processor design described in chapter 4. Answer the following questions about this program.

| loop_top: | lw   | \$4,100(\$5)     |
|-----------|------|------------------|
|           | sub  | \$5,\$5,\$3      |
|           | add  | \$3,\$6,\$4      |
|           | sll  | \$2,\$5,3        |
|           | sw   | \$3,200(\$3)     |
|           | beq  | \$2,\$3,foobar   |
|           | addi | \$5,\$5,2        |
|           | slt  | \$5,\$5,\$18     |
| foobar:   | or   | \$18,\$5,\$9     |
|           | lw   | \$8,100(\$18)    |
|           | bea  | \$8,\$0,loop top |

Assume the control unit **does not have** any hazard detection, forwarding, a new branch compare circuit, or automatic branch flushing, and that the register file **will not** write and then read a new register value in one clock cycle. Rewrite the code sequence by adding the minimum number of NOP instructions (*do not reorder or change instructions*) to eliminate all potential data and branch hazards. Assume other non-NOP instructions follow the last branch in the original code sequence above.



Next, assume the control unit is fully improved as outlined in the text by adding the hazard and forwarding unit, adding automatic branch flushing with a new compare unit to the decode stage along with forwarding muxes, and the register file writes and then reads a new value in a single clock cycle. Determine the number of clock cycles required to complete the first loop execution (i.e. executes code in loop and branches back to top of loop and is just ready to fetch lw again) of the original code sequence. Assume the inner branch is **not** taken.

| If there were no hazards or branch flushing, the original program would only require clock cycles for a single loop execution. (do not include the time to initially fill the pipeline at power up) the ? |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| But this program will need to stall and/or flush the pipeline an additional clock cycles so                                                                                                               |
| a total of clock cycles is required for execution ( do not include the time to initially fill the pipeline).                                                                                              |
| This program achieves a CPI (clocks per instruction) of                                                                                                                                                   |