I have a program that is written in C. The program is compiled with GCC on Linux, *BSD, Solaris.
I want the binary to be able to run on all "x64" processors, so I use -march=x86-64
option. Also, I was experimenting with different -mtune
options. My machine is AMD Ryzen 9 5950X, so I would think that -mtune=znver3
should produce the fastest code for my type of CPU. However, I did an exhaustive test of all-mtune
values supported by GCC. According to my test, there is hardly any difference between -mtune=generic
and the other values – except for -mtune=nocona
, which surprisingly produces faster code than any of the other -mtune
values!
The full result is attached to this post. The resulting runtime of the program, for each -mtune
value, is given in seconds. Lower (faster) is better. Tested with GCC version 12.1.0 on Debian Linux:
https://docdro.id/5D1wMEM
How comes that, of all things, -mtune=nocona
produces faster code? To my understanding, nocona is some old "improved version of Intel Pentium 4", very different from my Zen3 🤔
Thank you!