Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22079

Optimization of copying with an explicit byte order in arm-none-eabi-gcc

$
0
0

In writing part of a deserializer for a data structure in C, I needed a way to read 16-bit and 32-bit integers. Given that there is a possibility that this code may be compiled for and used on an architecture that may not be little-endian, I decided to write helper functions that explicitly decode from little-endian byte order:

#include <stdint.h>

void read_16(uint8_t *data, uint16_t *value) {
    *value = data[0] | (data[1] << 8);
}

void read_32(uint8_t *data, uint32_t *value) {
    *value = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
}

I was curious how this might be compiled on an architecture that is natively little-endian. arm-none-eabi-gcc with -mcpu=cortex-a9 and -Os gives the following output:

00000000 <read_16>:
   0:   e5d02001    ldrb    r2, [r0, #1]
   4:   e5d03000    ldrb    r3, [r0]
   8:   e1833402    orr r3, r3, r2, lsl #8
   c:   e1c130b0    strh    r3, [r1]
  10:   e12fff1e    bx  lr

00000014 <read_32>:
  14:   e5903000    ldr r3, [r0]
  18:   e5813000    str r3, [r1]
  1c:   e12fff1e    bx  lr

Question: Is there a reason why the optimizer would simplify to a load-then-store for 32-bit, but not for 16-bit, given that such an operation is valid, would be shorter and faster, and optimizations for size are enabled?

Specifically, I would expect the following assembly for read_16:

ldrh    r3, [r0]
strh    r3, [r1]
bx      lr

Viewing all articles
Browse latest Browse all 22079

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>