Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22016

`movaps` vs. `movups` in GCC: how does it decide?

$
0
0

I recently researched a segfault in a piece of software compiled with GCC 8. The code looked as follows (this is just a sketch)

struct Point{  int64_t x, y;};struct Edge{  // some other fields  // ...  Point p; // <- at offset `0xC0`  Edge(const Point &p) p(p) {}};Edge *create_edge(const Point &p){  void raw_memory = my_custom_allocator(sizeof(Edge));  return new (raw_memory) Edge(p);}

The key point here is that my_custom_allocator() returns pointers to unaligned memory. The code crashes because in order to copy the original point p into the field Edge::p of the new object the compiler used a movdqu/movaps pair in the [inlined] constructor code

movdqu 0x0(%rbp), %xmm1  ; read the original object at `rbp`...movaps %xmm1, 0xc0(%rbx) ; store it into the new `Edge` object at `rbx` - crash!

At first, everything seems to be clear here: the memory is not properly aligned, movaps crashes. My fault.

But is it?

Attempting to reproduce the problem on Godbolt I observe that GCC 8 actually attempts to handle it fairly intelligently. When it is sure that the memory is properly aligned it uses movaps, just like in my code. This

#include <new>#include <cstdlib>struct P { unsigned long long x, y; };unsigned char buffer[sizeof(P) * 100];void *alloc(){  return buffer;}void foo(const P& s){  void *raw = alloc();  new (raw) P(s);}

results in this

foo(P const&):    movdqu  xmm0, XMMWORD PTR [rsi]    movaps  XMMWORD PTR buffer[rip], xmm0    ret

https://godbolt.org/z/a3uSid

But when it is not sure, it uses movups. E.g. if I "hide" the definition of the allocator in the above example, it will opt for movups in the same code

foo(P const&):    push    rbx    mov     rbx, rdi    call    alloc()    movdqu  xmm0, XMMWORD PTR [rbx]    movups  XMMWORD PTR [rax], xmm0    pop     rbx    ret

https://godbolt.org/z/cNKe5A

So, if it is supposed to behave that way, why is it using movaps in the software I mentioned at the beginning of this post? In my case the implementation of my_custom_allocator() is not visible to the compiler at the point of the call, which is why I'd expect GCC to opt for movups.

What are the other factors that might be at play here? Is it a bug in GCC? How can I force GCC to use movups, preferably everywhere?


Viewing all articles
Browse latest Browse all 22016

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>