COMP226 Forwarding

Mike didn't spend as much time on data hazards as he would have liked to, so here are some notes to help you, especially so as to help you when you are doing Tutorial 6.

Consider the following code (you might have seen something like it somewhere before -- do you remember?).

I'm going to put the pipeline stages next to the code in the arrangement we used in lectures to show how each instruction is executed as time progresses from left to right. At first, only the IF stage of the first instruction is active, but then both the IF stage of the second instruction and the ID stage of the first instruction are active together. Get it? Next the IF stage of the third instruction and the ID stage of the second instruction and the EX stage of the first instruction will all be happening at the same time. And so on...

ld	[%r1],%r2     IF  ID  EX  MEM WB
ld	[%r1+4],%r5       IF  ID  EX  MEM WB
add	%r2,%r5,%r2           IF  ID  EX  MEM WB
ld	[%r1+8],%r5               IF  ID  EX  MEM WB
add	%r2,%r5,%r2                   IF  ID  EX  MEM WB
ld	[%r1+12],%r5                      IF  ID  EX  MEM WB
add	%r2,%r5,%r2                           IF  ID  EX  MEM WB
add	%r2,1,%r2                                 IF  ID  EX  MEM WB
ta 	0

Now, consider what happens in the first add instruction. There are two so called data dependencies -- the add instruction needs the values from registers 2 and 5, but only the values that are there after the preceding two load instructions have put them there (not some old values that might have been left over from an old calculation). So the add instruction depends upon those two load instructions.

Now let's check what happens with the pipeline. The first load instruction correctly puts its value in r2 during its write back (WB) stage. OK. But the add instruction needs the value (according to the execution sequence for ALIs we developed in lectures) during its decode stage. That's a big problem, because the decode stage of the add instruction occurs immediately before the WB stage of the first load instruction. That's a hazard! What we need to do is stall the pipeline so that the code in fact appears as follows.

ld	[%r1],%r2     IF  ID  EX  MEM WB
ld	[%r1+4],%r5       IF  ID  EX  MEM WB
add	%r2,%r5,%r2                   IF  ID  EX  MEM WB
ld	[%r1+8],%r5                       IF  ID  EX  MEM WB
add	%r2,%r5,%r2                           IF  ID  EX  MEM WB
ld	[%r1+12],%r5                              IF  ID  EX  MEM WB
add	%r2,%r5,%r2                                   IF  ID  EX  MEM WB
add	%r2,1,%r2                                         IF  ID  EX  MEM WB
ta 	0

Everything has to wait until the value is ready. In this case it took two stalls, since there are two empty stages in the pipeline now.

But it gets worse. Even with those two stalls, the add instruction won't work because the r5 value isn't in place in time!

Now it's your turn. What do we have to do to make sure that the dependency between the second load and the add works properly?

Are there other dependencies in this code? How well will the pipeline work if you have to introduce stalls to take care of all the dependencies? How many cycles will it take to complete the above code with all the dependencies resolved by stalls? And how does that compare with the number of cycles we thought we could do it in when we just looked at the code first time (top of this page -- pipeline stages without any stalls).

And for a really good exercise, suppose when we start running this code that register 2 contains 2, and register 5 contains 5, and register 1 points to the start of the data section in the datafile used in lecture 1 (the one that just contained 5, 4, 3, 2, 1 and 0 as words). What will the program get at the end in register 2 if it runs as written, fully pipelined, without any stalls (and therefore with lots of hazards doing curious things)?

OK. Now let me tell you the good news. Look again at the top diagram. To make it easier, I'll repeat the beginning of it here:

ld	[%r1],%r2     IF  ID  EX  MEM WB
ld	[%r1+4],%r5       IF  ID  EX  MEM WB
add	%r2,%r5,%r2           IF  ID  EX  MEM WB
ld	[%r1+8],%r5               IF  ID  EX  MEM WB

Notice that the first load has its answer for r2 ready at the end of its MEM stage (then in the next stage it goes ahead and puts that answer in r2). Notice that the add only really really needs the new value that's meant to come from r2 at the beginning of its EX stage. And notice that those two moments, the end of the first load's MEM stage and the beginning of the add's EX stage, are exactly the same moment in time. So, if we could put some circuitry that could forward the value from end of the MEM stage of the first load, to the beginning of the EX stage of the add, we wouldn't need to stall. The dependency between those two instructions (first and third) would work properly.

I'll show you in week 12 how we can actually build forwarding circuitry without much trouble. For now, you only have to think when can we do it and when can't we. If you think carefully you'll see that the second load can't forward it's answer to the add in time. (Why not?) So, the code just shown still needs one stall, but only one, if we include forwarding circuitry.

OK, if you've understood that, I think you're ready to try tutorial 6. Good luck with it, Mike.

Mike, August 2013.

COMP226 home page