I have a few questions about parallelisation using OMP.
Say I have a program, within which there is a nested for loop. From my understanding of the directive #pragma omp parallel for
, the outer iteration counter is automatically privatised. Is the same true for the inner iteration counter? This appears to be the case, as the outputs are identical whether I state it explicitly or not.
Is it necessary(/safer) to explicitly privatise iteration counters for for
loops within a parallel for block?
I am compiling with GCC - I found that I had some unhelpful crosstalk between threads when using GCC 5.4.0, but not when using GCC 7.5.0. To resolve this, I added private(foo, bar)
to the directive, but I am curious as to why it works without this statement for GCC 7.5.0. Does the GCC (7.5.0) automatically identify race conditions/crosstalk and privatise things it thinks should be private?
Other than allocating a few additional memory addresses, is there any significant overhead cost in privatising variables? I think likely 'yes, but (in my case) negligible'. Target audience for code will be using systems with ~10s-100s of cores
Toy example, which finds maximum values in chunks along an array:
find_max(double *inArr, double *outArr, int64_t nSamps, int64_t nCells, int64_t threads) { double maxVal, curVal; int64_t t, cell; #pragma omp parallel for private(maxVal, curVal) num_threads(threads) for (t=0; t<nSamps; t++) { maxVal = inArr[t]; for (cell=1; cell<nCells; cell++) { curVal = inArr[cell * nSamps + t]; if (curVal > maxVal) { maxVal = curVal; } } outArr[t] = maxVal; }}
I am building this as an extension module for a Python library - the call to gcc is:
gcc -pthread -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -Wall -Wstrict-prototypes -c src.c -o src.o -fopenmp -fPIC -Ofast