I've been trying to get some std::atan2
code to auto vectorize. I've been able to get it to the point where GCC doesn't complain about asin
, but it doesn't seem to be able to handle atan2
(nor atan
for that matter).
Here is a link to the godbolt version. here is the source code:
#include <cmath>
#include <cstddef>
#include <array>
constexpr std::size_t array_length = 16;
void calc_uv_coordinates(
const std::array<float,array_length>& pos_x,
const std::array<float,array_length>& pos_y,
const std::array<float,array_length>& pos_z,
std::array<float,array_length>& tex_u,
std::array<float,array_length>& tex_v) noexcept{
#pragma clang loop vectorize(enable)
for(std::size_t i = 0; i < array_length; ++i){
tex_u[i] = std::atan2(pos_z[i], pos_x[i]) / (2 * M_PI);
tex_v[i] = std::asin(pos_y[i]) / M_PI;
}
}
Here are the compiler arguments:
clang 9.0: -c -Ofast -ffast-math -Rpass-analysis=loop-vectorize
gcc 9.2: -c -Ofast -ffast-math -fopt-info-vec-missed -fopt-info-vec
Clang says:
In file included from <source>:2:
/opt/compiler-explorer/gcc-9.2.0/lib/gcc/x86_64-linux-gnu/9.2.0/../../../../include/c++/9.2.0/cmath:145:12: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis]
{ return __builtin_atan2f(__y, __x); }
^
<source>:15:5: warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
for(std::size_t i = 0; i < array_length; ++i){
^
1 warning generated.
ASM generation compiler returned: 0
clang-9: warning: -Wl,-rpath,/opt/compiler-explorer/clang-9.0.0/lib: 'linker' input unused [-Wunused-command-line-argument]
clang-9: warning: -Wl,-rpath,/opt/compiler-explorer/clang-9.0.0/lib32: 'linker' input unused [-Wunused-command-line-argument]
clang-9: warning: -Wl,-rpath,/opt/compiler-explorer/clang-9.0.0/lib64: 'linker' input unused [-Wunused-command-line-argument]
In file included from <source>:2:
/opt/compiler-explorer/gcc-9.2.0/lib/gcc/x86_64-linux-gnu/9.2.0/../../../../include/c++/9.2.0/cmath:145:12: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis]
{ return __builtin_atan2f(__y, __x); }
^
<source>:15:5: warning: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
for(std::size_t i = 0; i < array_length; ++i){
^
1 warning generated.
Execution build compiler returned: 0
Program returned: 1
Error: no suitable ./output.s executable found
GCC says:
<source>:15:30: missed: couldn't vectorize loop
/opt/compiler-explorer/gcc-9.2.0/include/c++/9.2.0/cmath:145:28: missed: not vectorized: relevant stmt not supported: _15 = __builtin_atan2f (_2, _1);
ASM generation compiler returned: 0
<source>:15:30: missed: couldn't vectorize loop
/opt/compiler-explorer/gcc-9.2.0/include/c++/9.2.0/cmath:145:28: missed: not vectorized: relevant stmt not supported: _15 = __builtin_atan2f (_2, _1);
Execution build compiler returned: 0
Program returned: 1
Error: no suitable ./output.s executable found
I'm not sure I understand what relevant stmt not supported: _15 = __builtin_atan2f (_2, _1);
is saying, I believe it is saying that GCC has not put in the manual effort to auto vectorize the built in. It appears that GCC only auto vectorizes trig with built-in assembly instructions, but I don't understand why that forces atan2f off the table for vectorization, or why that makes it any different than an arbitrary arithmetic function.
Other libraries have SIMD atan
and atan2
, so it isn't like this function is impossible to vectorize. In principle this should just be each part of atan2
being vectorized along all elements.