Assume that there is a degree-2 superscalar version of our 5-stage pipeline system that is capable of fetching and decoding pairs of instructions in parallel. The decoded instructions are placed into a 4-slot instruction window from which up to 2 instructions can be selected to enter the execute stage in a single cycle. The execute stage contains two identical ALUs. Instructions are held in the instruction window until the inputs they require have been written or until the inputs are available for forwarding. Instructions stall in the decode stage if there is no room in the instruction window. As usual, each pipeline stage requires one clock cycle.
How many clock cycles would be required to execute the following instruction sequence on this superscalar system if out-of-order instruction issue and out-of-order completion are allowed? Use the table below to the location of each instruction within the pipeline for each of the clock cycles. Include as many additional rows as you need.
lw $8,4($6)
add $3,$8,$4
sub $11,$10,$7
slt $14,$9,$10
sll $2,$14,2
Cycle FetchDecodeExecuteMemoryWrite Back1234