I noticed that when vectorizing a loop in a C program, the speedup achieved is much greater when using operands of the type float compared to double operands.
Example:
for (int i = 0; i < N; i++) { a[i] += b[i] * c[i];}
When a, b and c arrays of size 20,000 each and I repeat this loop 1,000,000:
- Without vectorization it takes around 24 seconds with both floats and doubles
- With auto vectorization (compiling with -O1 -ftree-vectorize) it takes 7 seconds with floats and 21 seconds with doubles
With OpenMP (#pramga omp simd) it is similar to the above bullet point.
What could be the reason for this?
Edit: Further information:
- Processor: Intel Core i7-2677M CPU @ 1.80GHz
- The surrounding code is nothing but array allocations (using calloc) and a loop where the arrays b and c are filled with constant values.