# What About Pipelined Control? - Should it be like single-cycle control? - But individual insn signals must be staged - How many different control units do we need? - One for each insn in pipeline? - Solution: use simple single-cycle control, but pipeline it - Single controller - Key idea: pass control signals with instruction through pipeline © 2009 Daniel J. Sorin from Roth 16 18 ### Pipeline Performance Calculation - Single-cycle - Clock period = 50ns, CPI = 1 - Performance = 50ns/insn - · Pipelined - Clock period = 12ns (why not 10ns?) - CPI = 1 (each insn takes 5 cycles, but 1 completes each cycle) - Performance = 12ns/insn © 2009 Daniel J. Sorin from Roth Why Does Every Insn Take 5 Cycles? Register • Why not let add skip M and go straight to W? • It wouldn't help: peak fetch still only 1 insn per cycle • Structural hazards: not enough resources per stage for 2 insns ECE 152 # Pipeline Hazards - Hazard: condition leads to incorrect execution if not fixed - "Fixing" typically increases CPI - Three kinds of hazards #### Structural hazards - Two insns trying to use same circuit at same time - E.g., structural hazard on RegFile write port - Fix by proper ISA/pipeline design: 3 rules to follow - Each insn uses every structure exactly once - For at most one cycle - Always at same stage relative to F - Data hazards (next) - Control hazards (a little later) © 2009 Daniel J. Sorin from Roth ECE 152 # Fixing Register Data Hazards - Can only read register value 3 cycles after writing it - One way to enforce this: make sure programs can't do it - Compiler puts two independent insns between write/read insn pair - · If they aren't there already - Independent means: "do not interfere with register in question" - Do not write it: otherwise meaning of program changes - Do not read it: otherwise create new data hazard - Code scheduling: compiler moves around existing insns to do this - If none can be found, must use NOPs - This is called **software interlocks** - MIPS: Microprocessor w/out Interlocking Pipeline Stages © 2009 Daniel J. Sorin from Roth ECE 152 24 # Software Interlock Example ``` add $3,$2,$1 lw $4,0($3) sw $7,0($3) add $6,$2,$8 addi $3,$5,4 ``` - Can any of last 3 insns be scheduled between first two? - sw \$7,0(\$3)? No, creates hazard with add \$3,\$2,\$1 - add \$6,\$2,\$8? OK - addi \$3,\$5,4? No, 1w would read \$3 from it - Still need one more insn, use nop ``` add $3,$2,$1 add $6,$2,$8 nop lw $4,0($3) sw $7,0($3) addi $3,$5,4 ``` © 2009 Daniel J. Sorin from Roth E 152 # Software Interlock Performance - · Software interlocks - Assume 20% of insns require insertion of 1 nop - Assume 5% of insns require insertion of 2 nops - CPI is still 1 technically - But now there are more insns - #insns = 1 + 0.20\*1 + 0.05\*2 = 1.3 - 30% more insns (30% slowdown) due to data hazards **Hardware Interlocks** - Problem with software interlocks? Not compatible - Where does 3 in "read register 3 cycles after writing" come from? - From structure (depth) of pipeline - What if next MIPS version uses a 7 stage pipeline? - Programs compiled assuming 5 stage pipeline will break - A better (more compatible) way: hardware interlocks - $\bullet\,$ Processor detects data hazards and fixes them - Two aspects to this - Detecting hazards - Fixing hazards © 2009 Daniel J. Sorin from Roth 26 ECE 152 27 # **Pipeline Control Terminology** - Hardware interlock maneuver is called **stall** or **bubble** - Mechanism is called **stall logic** - Part of more general **pipeline control** mechanism - Controls advancement of insns through pipeline - Distinguished from **pipelined datapath control** - Controls datapath at each stage - Pipeline control controls advancement of datapath control © 2009 Daniel J. Sorin from Roth ECE 152 33 ### Pipeline Diagram with Data Hazards - Data hazard stall indicated with d\* - Stall propagates to younger insns | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |-----------------|---|---|----|----|---|---|---|---|---| | add \$3,\$2,\$1 | F | D | Х | М | W | | | | | | lw \$4,0(\$3) | | F | d* | d* | D | Χ | М | W | | | sw \$6.4(\$7) | | | | | F | D | Υ | М | W | • This is not OK (why?) | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |-----------------|---|---|----|----|---|---|---|---|---| | add \$3,\$2,\$1 | F | D | Χ | М | W | | | | | | lw \$4,0(\$3) | | F | d* | d* | D | Х | М | W | | | sw \$6,4(\$7) | | | F | D | Х | М | W | | | © 2009 Daniel J. Sorin from Roth 34 #### Hardware Interlock Performance - Hardware interlocks: same as software interlocks - 20% of insns require 1 cycle stall (i.e., insertion of 1 nop) - 5% of insns require 2 cycle stall (i.e., insertion of 2 nops) - CPI = 1 + 0.20\*1 + 0.05\*2 = 1.3 - So, either CPI stays at 1 and #insns increases 30% (software) - Or, #insns stays at 1 (relative) and CPI increases 30% (hardware) - Same difference - Anyway, we can do better © 2009 Daniel J. Sorin from Roth ECE 152 # Pipeline Diagram With Bypassing | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |-----------------|---|---|---|----|---|---|---|---|-------| | add \$3,\$2,\$1 | F | D | Х | М | W | | | | | | lw \$4,0(\$3) | | F | D | Χ | М | W | | | - - | | addi \$6,\$4,1 | | | F | d* | D | Χ | М | W | | - Sometimes you will see it like this - Denotes that stall logic implemented at X stage, rather than D - Equivalent, doesn't matter when you stall as long as you do | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |-----------------|---|---|---|---|----|---|---|---|---| | add \$3,\$2,\$1 | F | D | Х | М | W | | | | | | lw \$4,0(\$3) | | F | D | Х | М | W | | | | | addi \$6,\$4,1 | | | F | D | d* | Χ | М | W | | © 2009 Daniel J. Sorin from Roth ECE 152 ### Pipeline Diagram with Multiplier | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |-----------------|---|---|----|----|----|----|---|---|---| | mul \$4,\$3,\$5 | F | D | P0 | P1 | P2 | Р3 | W | - | | | addi \$6,\$4,1 | | F | d* | d* | d* | D | Χ | М | W | - This is the situation that slide #48 logic tries to avoid - Two instructions trying to write RegFile in same cycle | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |------------------|---|---|----|----|----|----|---|---|---| | mul \$4,\$3,\$5 | F | D | P0 | P1 | P2 | Р3 | W | | | | addi \$6,\$1,1 | | F | D | Х | М | W | | | | | add \$5,\$6,\$10 | | | F | D | Χ | М | w | | | © 2009 Daniel J. Sorin from Roth ECE 152 More Multiplier Nasties - This is the situation that slide **#49** logic tries to avoid - Mis-ordered writes to the same register - Compiler thinks add gets \$4 from addi, actually gets it from $\mathtt{mul}$ | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |------------------|---|---|----|----|----------|----|---|---|---| | mul \$4,\$3,\$5 | F | D | P0 | P1 | P2 | Р3 | W | | | | addi \$4,\$1,1 | | F | D | Χ | М | W | | | | | | | | | | | | | | | | | | | | | <b>.</b> | | | | | | add \$10,\$4,\$6 | | - | | | F | D | Χ | М | W | - Multi-cycle operations complicate pipeline logic - They're not impossible, but they require more complexity © 2009 Daniel J. Sorin from Roth 50 ECE 152