I have a simple math library that gets linked into a project which runs on simulator hardware (32 bit RTOS) and the compiler toolchain is based on a variant of GCC 5.5. The main project code is in Matlab but the core math operations (cmath functions on array data) are re-written in C for performance. Looking at Compiler Explorer, the quality of the optimised code does not seem great for GCC 5.5 32 bit (for reference: Clang trunk 32bit). From what I understand, Clang does a better job optimising the loops.
As I am unable to use Clang directly, is there any value in re-writing the C source using AVX intrinsics. I think the majority of the performance cost comes from the cmath function calls, most of which do not have intrinsic implementations.