I can find the n'th set bit of a 16-bit value by using a lookup table but this can't be done for a 32-bit value, without breaking it up and using several LUTs. This is discussed in How to efficiently find the n-th set bit? which suggests a use similar to this
#include <immintrin.h>
//...
unsigned i = 0x6B5; // 11010110101
unsigned n = 4; // ^
unsigned j = _pdep_u32(1u << n, i);
int bitnum = __builtin_ctz(j)); // result is bit 7
I have profiled this suggested loop method
j = i;
for (k = 0; k < n; k++)
j &= j - 1;
return (__builtin_ctz(j));
against the two variants of the bit-twiddling code in an answer from Nominal Animal. The branching variant of the twiddle was the fastest of the three code versions, not by much. However in my actual code, the above snippet using __builtin_ctz
was faster. There are other answers such as from fuz who suggests what I found too: the bit-twiddles take a similar time as the loop method.
So I now want to try using _pdep_u32
but am unable to get _pdep_u32
recognised. I read in gcc.gnu.org
The following built-in functions are available when
-mbmi2
is used.
...unsigned int _pdep_u32 (unsigned int, unsigned int)
unsigned long long _pdep_u64 (unsigned long long, unsigned long long)
...
However I am using an online compiler and don't have access to the options.
Is there a way to enable options with the preprocessor?
There is plenty of material about controlling the preprocessor from command line options, but I can't find how to do it the other way round.
Ultimately I want to use the 64-bit version _pdep_u64
.