I have a simple blinking led program running on STM32f103C8 (without initialization boilerplate):
void soft_delay(void) { for (volatile uint32_t i=0; i<2000000; ++i) { }} uint32_t iters = 0; while (1) { LL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin); soft_delay();++iters; }
It was compiled with both Keil uVision v.5
(default compiler) and CLion
using arm-none-eabi-gcc
compiler.The surprise is that arm-none-eabi-gcc program runs 50% slower in Release mode (-O2 -flto) and 100% slower in Debug mode.
I suspect 3 reasons:
Keil over-optimization (unlikely, because the code is very simple)
arm-none-eabi-gcc under-optimization due to wrong compiler flags (I use CLion Embedded plugins` CMakeLists.txt)
A bug in the initialization so that chip has lower clock frequency with arm-none-eabi-gcc (to be investigated)
I have not yet dived into the jungles of optimization and disassembling, I hope that there are many experienced embedded developers who already encountered this issue and have the answer.
UPDATE 1
Playing around with different optimization levels of Keil ArmCC, I seehow it affects the generated code. And it affects drastically, especially execution time. Here are the benchmarks and disassembly of soft_delay()
function for each optimization level (RAM and Flash amounts include initialization code).
-O0: RAM: 1032, Flash: 1444, Execution Time (20 iterations): 18.7 sec
soft_delay PROC PUSH {r3,lr} MOVS r0,#0 STR r0,[sp,#0] B |L6.14||L6.8| LDR r0,[sp,#0] ADDS r0,r0,#1 STR r0,[sp,#0]|L6.14| LDR r1,|L6.24| LDR r0,[sp,#0] CMP r0,r1 BCC |L6.8| POP {r3,pc} ENDP
-O1: RAM: 1032, Flash: 1216, Execution Time (20 iterations): 13.3 sec
soft_delay PROC PUSH {r3,lr} MOVS r0,#0 STR r0,[sp,#0] LDR r0,|L6.24| B |L6.16||L6.10| LDR r1,[sp,#0] ADDS r1,r1,#1 STR r1,[sp,#0]|L6.16| LDR r1,[sp,#0] CMP r1,r0 BCC |L6.10| POP {r3,pc} ENDP
-O2 -Otime: RAM: 1032, Flash: 1136, Execution Time (20 iterations): 9.8 sec
soft_delay PROC SUB sp,sp,#4 MOVS r0,#0 STR r0,[sp,#0] LDR r0,|L4.24||L4.8| LDR r1,[sp,#0] ADDS r1,r1,#1 STR r1,[sp,#0] CMP r1,r0 BCC |L4.8| ADD sp,sp,#4 BX lr ENDP
-O3: RAM: 1032, Flash: 1176, Execution Time (20 iterations): 9.9 sec
soft_delay PROC PUSH {r3,lr} MOVS r0,#0 STR r0,[sp,#0] LDR r0,|L5.20||L5.8| LDR r1,[sp,#0] ADDS r1,r1,#1 STR r1,[sp,#0] CMP r1,r0 BCC |L5.8| POP {r3,pc} ENDP
TODO: benchmarking and disassembly for arm-none-eabi-gcc
.