Intel CPUs are capable of performing 512 or 1024 bitwise operations using vectorized operations. Assume I have a (pseudo)code that looks like this:
w0 = i0 & i1
w1 = i1 & i2
w2 = i0 & i3
w3 = w0 & w1
w4 = w1 & w2
Does GCC
or Intel compiler vectorize this code automatically or I need to rewrite the code to be able to benefit from vectorization? Ideally, I would like the first three operations to be performed in parallel and then, the next two computed in parallel.