i have production code that is using avx2 and working well. now i try to use avx512 to speedup the performance. i wrote a small program to test the speed. this test only does some float matrix multiplication. The result shows that avx512 is slower than avx2. is there anything wroing?
i found one soluiton from this question, which is enable -mprefer-vector-width=512 in gcc. But this does not work with me. I tried gcc 7, 8 and 9.
the processor i used is Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
I put core func of the matrix multiplication in Godbolt links. My test is just a loop of calling this func.
Avx2 with the make command gcc -O3 -mavx -mfma src/test_avx2.c -o test_avx2
Avx512 with the make commandgcc -O3 -mprefer-vector-width=512 -Ofast -march=native src/test_avx512.c -o test_avx512