When I compile this snippet (with -Ofast -fnest-loop-optimize
) gcc generates assembly which traverses the array in source order.
However if I uncomment the line // n = 32767
and assign any number to n
, it interchanges the index order to x[i * n + j]
. Traversing memory in contiguous row-major order is much more cache-friendly than striding down columns.
float matrix_sum_column_major(float* x, int n) { // n = 32767; float sum = 0; for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) sum += x[j * n + i]; return sum;}
Why can't GCC or clang do loop interchange with a runtime-variable int
size? Real-world code won't usually have the size declared explicitly.
PD: I've tried this with different versions of gcc and clang-9 and it seems to happen in both.
PD2: Even if I make x
be a local variable malloc
ed inside the function it still happens.