I am trying to compile generate C code which comes from large Dymola models. The code generated is unlike what a human would write, there are many unrolled loops, extensive use of macros, huge arrays which are manually indexed and most importantly the source files are massive (>1e6 lines).
When compiling these source files with O2
or O3
, my compile times become unmanageably high: 10-30 mins per file. This is with both Clang and GCC. I can't follow the generated assembly code very well, so I am unsure about the quality of the optimisation. Compile-time can be reduced by not generating debug info or by turning off warnings, but these are small as compared to turning off optimisations. In terms of runtime, there is a noticeable difference between O0
and O2
, so I cannot justify doing this. When compiling with -ftime-trace
, I can see that the Clang frontend is responsible for > 90% of the time. The process is not bottlenecked by memory, it seems to be entirely CPU bound, according to htop
.
Is there some preprocessing which I can do to improve the compile times? Will breaking up the source file into smaller chunks improve performance, why? Are compilers design to work with these huge source files? Are there any other compile options I should be aware of?
Surprisingly, MSVC on Windows with /O2
takes a fraction of the time that Clang and GCC take.
Example of compiler arguments: clang -m64 -Wno-everything -c -D_GNU_SOURCE -DMATLAB_MEX_FILE -ftime-report -DFMI2_FUNCTION_PREFIX=F2_Simulations_SteadyState_SteadyState2019MPU_ -DRT -I/opt/matlab/r2017b/extern/include -I/opt/matlab/r2017b/simulink/include -I/mnt/vagrant_shared/<path>/Source -I/mnt/vagrant_shared/<path>/export -fexceptions -fPIC -fno-omit-frame-pointer -pthread -O0 -DNDEBUG -std=c99 /mnt/vagrant_shared/<path>/some_file.c -o /mnt/vagrant_shared/<path>/some_obj.obj
Platform: CentOS 7 running on a virtual box VM. Clang 7, GCC 4.8 (I am stuck on these older versions because of other requirements).