Consider the following code (you might have seen something like it somewhere before -- do you remember?).
I'm going to put the pipeline stages next to the code in the arrangement we used in lectures to show how each instruction is executed as time progresses from left to right. At first, only the IF stage of the first instruction is active, but then both the IF stage of the second instruction and the ID stage of the first instruction are active together. Get it? Next the IF stage of the third instruction and the ID stage of the second instruction and the EX stage of the first instruction will all be happening at the same time. And so on...
ld [%r1],%r2 IF ID EX MEM WB ld [%r1+4],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB ld [%r1+8],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB ld [%r1+12],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB add %r2,1,%r2 IF ID EX MEM WB ta 0
Now, consider what happens in the first add instruction. There are two
so called
Now let's check what happens with the pipeline. The first load instruction
correctly puts its value in r2 during its write back (WB) stage. OK. But
the add instruction needs the value (according to the execution sequence
for ALIs we developed in lectures) during its decode stage. That's a big
problem, because the decode stage of the add instruction occurs immediately
before the WB stage of the first load instruction. That's a
ld [%r1],%r2 IF ID EX MEM WB ld [%r1+4],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB ld [%r1+8],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB ld [%r1+12],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB add %r2,1,%r2 IF ID EX MEM WB ta 0Everything has to wait until the value is ready. In this case it took two stalls, since there are two empty stages in the pipeline now.
But it gets worse. Even with those two stalls, the add instruction won't work because the r5 value isn't in place in time!
Now it's your turn. What do we have to do to make sure that the dependency between the second load and the add works properly?
Are there other dependencies in this code? How well will the pipeline work if you have to introduce stalls to take care of all the dependencies? How many cycles will it take to complete the above code with all the dependencies resolved by stalls? And how does that compare with the number of cycles we thought we could do it in when we just looked at the code first time (top of this page -- pipeline stages without any stalls).
And for a really good exercise, suppose when we start running this code that register 2 contains 2, and register 5 contains 5, and register 1 points to the start of the data section in the datafile used in lecture 1 (the one that just contained 5, 4, 3, 2, 1 and 0 as words). What will the program get at the end in register 2 if it runs as written, fully pipelined, without any stalls (and therefore with lots of hazards doing curious things)?
OK. Now let me tell you the good news. Look again at the top diagram. To make it easier, I'll repeat the beginning of it here:
ld [%r1],%r2 IF ID EX MEM WB ld [%r1+4],%r5 IF ID EX MEM WB add %r2,%r5,%r2 IF ID EX MEM WB ld [%r1+8],%r5 IF ID EX MEM WBNotice that the first load has its answer for r2 ready at the end of its MEM stage (then in the next stage it goes ahead and puts that answer in r2). Notice that the add only really really needs the new value that's meant to come from r2 at the beginning of its EX stage. And notice that those two moments, the end of the first load's MEM stage and the beginning of the add's EX stage, are exactly the same moment in time. So, if we could put some circuitry that could
I'll show you in week 12 how we can actually build forwarding circuitry without much trouble. For now, you only have to think when can we do it and when can't we. If you think carefully you'll see that the second load can't forward it's answer to the add in time. (Why not?) So, the code just shown still needs one stall, but only one, if we include forwarding circuitry.
OK, if you've understood that, I think you're ready to try tutorial 6. Good luck with it, Mike.
| Mike, August 2013. | COMP226 home page |