I have a simple blinking led program running on STM32f103C8 (without initialization boilerplate):
void soft_delay(void) {
for (volatile uint32_t i=0; i<2000000; ++i) { }
}
uint32_t iters = 0;
while (1)
{
LL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin);
soft_delay();
++iters;
}
It was compiled with both Keil uVision v.5
(default compiler) and CLion
using arm-none-eabi-gcc
compiler.
The surprise is that arm-none-eabi-gcc program runs 50% slower in Release mode (-O2 -flto) and 100% slower in Debug mode.
I suspect 3 reasons:
Keil over-optimization (unlikely, because the code is very simple)
arm-none-eabi-gcc under-optimization due to wrong compiler flags (I use CLion Embedded plugins` CMakeLists.txt)
A bug in the initialization so that chip has lower clock frequency with arm-none-eabi-gcc (to be investigated)
I have not yet dived into the jungles of optimization and disassembling, I hope that there are many experienced embedded developers who already encountered this issue and have the answer.
UPDATE 1
Playing around with different optimization levels of Keil ArmCC, I see
how it affects the generated code. And it affects drastically, especially execution time. Here are the benchmarks and disassembly of soft_delay()
function for each optimization level (RAM and Flash amounts include initialization code).
-O0: RAM: 1032, Flash: 1444, Execution Time (20 iterations): 18.7 sec
soft_delay PROC
PUSH {r3,lr}
MOVS r0,#0
STR r0,[sp,#0]
B |L6.14|
|L6.8|
LDR r0,[sp,#0]
ADDS r0,r0,#1
STR r0,[sp,#0]
|L6.14|
LDR r1,|L6.24|
LDR r0,[sp,#0]
CMP r0,r1
BCC |L6.8|
POP {r3,pc}
ENDP
-O1: RAM: 1032, Flash: 1216, Execution Time (20 iterations): 13.3 sec
soft_delay PROC
PUSH {r3,lr}
MOVS r0,#0
STR r0,[sp,#0]
LDR r0,|L6.24|
B |L6.16|
|L6.10|
LDR r1,[sp,#0]
ADDS r1,r1,#1
STR r1,[sp,#0]
|L6.16|
LDR r1,[sp,#0]
CMP r1,r0
BCC |L6.10|
POP {r3,pc}
ENDP
-O2 -Otime: RAM: 1032, Flash: 1136, Execution Time (20 iterations): 9.8 sec
soft_delay PROC
SUB sp,sp,#4
MOVS r0,#0
STR r0,[sp,#0]
LDR r0,|L4.24|
|L4.8|
LDR r1,[sp,#0]
ADDS r1,r1,#1
STR r1,[sp,#0]
CMP r1,r0
BCC |L4.8|
ADD sp,sp,#4
BX lr
ENDP
-O3: RAM: 1032, Flash: 1176, Execution Time (20 iterations): 9.9 sec
soft_delay PROC
PUSH {r3,lr}
MOVS r0,#0
STR r0,[sp,#0]
LDR r0,|L5.20|
|L5.8|
LDR r1,[sp,#0]
ADDS r1,r1,#1
STR r1,[sp,#0]
CMP r1,r0
BCC |L5.8|
POP {r3,pc}
ENDP
TODO: benchmarking and disassembly for arm-none-eabi-gcc
.