Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22113

Does arm-none-eabi-gcc produce slower code than Keil uVision

$
0
0

I have a simple blinking led program running on STM32f103C8 (without initialization boilerplate):

void soft_delay(void) {
    for (volatile uint32_t i=0; i<2000000; ++i) { }
}

  uint32_t iters = 0;
  while (1)
  {
    LL_GPIO_TogglePin(LED_GPIO_Port, LED_Pin);
    soft_delay();
    ++iters;
  }

It was compiled with both Keil uVision v.5 (default compiler) and CLion using arm-none-eabi-gcc compiler. The surprise is that arm-none-eabi-gcc program runs 50% slower in Release mode (-O2 -flto) and 100% slower in Debug mode.

I suspect 3 reasons:

  • Keil over-optimization (unlikely, because the code is very simple)

  • arm-none-eabi-gcc under-optimization due to wrong compiler flags (I use CLion Embedded plugins` CMakeLists.txt)

  • A bug in the initialization so that chip has lower clock frequency with arm-none-eabi-gcc (to be investigated)

I have not yet dived into the jungles of optimization and disassembling, I hope that there are many experienced embedded developers who already encountered this issue and have the answer.

UPDATE 1

Playing around with different optimization levels of Keil ArmCC, I see how it affects the generated code. And it affects drastically, especially execution time. Here are the benchmarks and disassembly of soft_delay() function for each optimization level (RAM and Flash amounts include initialization code).

-O0: RAM: 1032, Flash: 1444, Execution Time (20 iterations): 18.7 sec

soft_delay PROC
        PUSH     {r3,lr}
        MOVS     r0,#0
        STR      r0,[sp,#0]
        B        |L6.14|
|L6.8|
        LDR      r0,[sp,#0]
        ADDS     r0,r0,#1
        STR      r0,[sp,#0]
|L6.14|
        LDR      r1,|L6.24|
        LDR      r0,[sp,#0]
        CMP      r0,r1
        BCC      |L6.8|
        POP      {r3,pc}
        ENDP

-O1: RAM: 1032, Flash: 1216, Execution Time (20 iterations): 13.3 sec

soft_delay PROC
        PUSH     {r3,lr}
        MOVS     r0,#0
        STR      r0,[sp,#0]
        LDR      r0,|L6.24|
        B        |L6.16|
|L6.10|
        LDR      r1,[sp,#0]
        ADDS     r1,r1,#1
        STR      r1,[sp,#0]
|L6.16|
        LDR      r1,[sp,#0]
        CMP      r1,r0
        BCC      |L6.10|
        POP      {r3,pc}
        ENDP

-O2 -Otime: RAM: 1032, Flash: 1136, Execution Time (20 iterations): 9.8 sec

soft_delay PROC
        SUB      sp,sp,#4
        MOVS     r0,#0
        STR      r0,[sp,#0]
        LDR      r0,|L4.24|
|L4.8|
        LDR      r1,[sp,#0]
        ADDS     r1,r1,#1
        STR      r1,[sp,#0]
        CMP      r1,r0
        BCC      |L4.8|
        ADD      sp,sp,#4
        BX       lr
        ENDP

-O3: RAM: 1032, Flash: 1176, Execution Time (20 iterations): 9.9 sec

soft_delay PROC
        PUSH     {r3,lr}
        MOVS     r0,#0
        STR      r0,[sp,#0]
        LDR      r0,|L5.20|
|L5.8|
        LDR      r1,[sp,#0]
        ADDS     r1,r1,#1
        STR      r1,[sp,#0]
        CMP      r1,r0
        BCC      |L5.8|
        POP      {r3,pc}
        ENDP

TODO: benchmarking and disassembly for arm-none-eabi-gcc.


Viewing all articles
Browse latest Browse all 22113

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>