Home
Concepts
Classifications
Glossary Terms
References
Authors
Instruction Execution I-Unit Pipelining I-Unit Bypassing Die Size Technology Prediction
I-Unit Pipelining Overview I-Unit Pipelining Examples
Overview Non-Pipelining Example Pipelining Example I-Unit Pipelining Summary

I-Unit Pipelining Examples -  Summary

Non-Pipelining:

  • The reason Non-Pipelining has 111 cycles is because 16 instructions multiplied by 6 cycles each is 96. Then add 7 for the first cache miss, 5 cycles for the other cache miss, 1 cycle for the multiplying of "2 ( a + b + 4 )", and 2 cycles for the multiplying of "3 ( a + b + 6 )". The total is 111 cycles:
  •   16  # Number of Instructions
     * 6  # Number of cycles per Instruction
      96
       7  # Cache Miss for line at 3000
       5  # Cache Miss for line at 2000 (2 less because stores are released early)
       1  # Extra cycle to calculate first multiply
     + 2  # Extra cycles to calculate second multiply
     111  # Total number of cycles

Pipelining:

  • The Pipelining example took 60 cycles. Ideally, by running the instructions in parallel, a new instruction would have been completed in each cycle. This would have ended up completing the sequence in the following number of cycles:
  •   16  # Number of Instructions
       5  # Number of cycles it takes to get the Pipeline started
       7  # Cache Miss for line at 3000
       5  # Cache Miss for line at 2000 (2 less because store are released early)
       1  # Extra cycle to calculate first multiply
     + 2  # Extra cycles to calculate second multiply
      36  # Total number of cycles
    But it didn't. The reason it took 60 cycles instead of 36 is because that some of the instructions depended on the results of instructions that were still being executed. For example, the second instruction at cycle 004 had to wait in the A-Stage for 3 cycles. The reason for this additional wait, was because the address generation depended on the value of GPR1 which was being modified by an instruction further down in the pipe. The instruction in the A-Stage had to wait until GPR1 was modified with the new value.

Comparison

  • Since a 6-stage pipeline was used, there should have been a factor of 6 improvement in performance. Actually the performance only improved by a factor of 1.85. No where need the factor of 6 we'd like to get, but overall the addition of this single concept almost doubled the performance. Don't worry, with bypassing the performance will improve to a factor of 3.08 over the non-pipelined example. Still no where near the factor of 6.  Ideally we would have also included an example of EPIC, which for this program would have gotten us near or perhaps even beyond the factor of 6.
  • Previous Home Next