2.- The instruction scheduler tries to rearrange the code so that many expressions are computed concurrently. By computing expressions concurrently, however, more registers are needed resulting in what's called register pressure.
- How - see book 233 Why - avoid pipeline stalls
-
(sigma(v) + i) mod II - Loop's can use their own indices
Terminology
Matrix distance
Let's say we have the following:
float a[N][M], b[N][M], c[N][M];
void foo()
{
int i, j;
for (i = 2; i < 100; i++) {
for (j = 2 + 3 * i; j < 1000; j++) {
a[ i ][ j ] = [ i - 2 ][ j + 3 ];
b[ i ][ j ] = [ i ][ j - 2 ];
c[ i ][ j ] = [ i + 1 ][ j + 2 ];
}
}
}
Finding the distance matrix of the loop above is done as follows.
We start of with calculating the distance vector for the a vector.
IA + a0 = JB + b0
where
A = [1 0; 0 1]
a0 = [0 0]
B = [1 0; 0 1]
b0 = [-2 3]
We then re-write IA + a0 = JB + b0 to:
[I J][A; -B] = [i1 j1 i2 j2][A; -B] = b0 - a0
For the a vector this results in:
[ 1 0 ]
[i1 j1 i2 j2]| 0 1 | = [-2 3] - [0 0]
| -1 0 |
[ 0 -1 ]
Which in turn gives us:
(i1 - i2, j1 - j2) = (-2, 3)
if I = (t1, t2) =>
i1 - i2 = -2 => i2 = t1 + 2
j1 - j2 = 3 => j2 = t2 - 3
=>
J = (t1 + 2, t2 - 3)
=>
d_a = (2, -3)
In the case of the last vector's distance vector d_c we get a vector
that's negative when first examined. However we want a vector that's
lexicographically positive. As such we must check whether I-J or J-I
gives a positive distance vector. In this case J - I gives a positive
vector d_c = (1, 2).
TIP!
If both (x, y) are positive then the read occurs before the write.
Hyperplane Method
The Hyperplane Method is a way to enable parallel execution of all loops
except the outermost. Given a perfect loop nest L and a distance matrix
D, the method finds a m x m unimodular matrix U, this matrix carries
all the dependencies in the outermost loop L1. All inner loops L2..Lm
can be executed in parallel, as long as all elements in the first column
are >= 1.
Determine U by using the following system, where it is required that the
first column is (for all elements) >= 1.
d1 x u >= 1 .... dn x u >= 1, where u = (u1, u2... um), the first
column of U.
U can be determined by
| u1 1 0 0 |
U = [ u; [ I_(m-1); zeroes ] ] = | u2 0 1 0 |
| u3 0 0 1 |
| um 0 0 0 |
Visa påGitHub