I recently researched a segfault in a piece of software compiled with GCC 8. The code looked as follows (this is just a sketch)
struct Point{ int64_t x, y;};struct Edge{ // some other fields // ... Point p; // <- at offset `0xC0` Edge(const Point &p) p(p) {}};Edge *create_edge(const Point &p){ void raw_memory = my_custom_allocator(sizeof(Edge)); return new (raw_memory) Edge(p);}
The key point here is that my_custom_allocator()
returns pointers to unaligned memory. The code crashes because in order to copy the original point p
into the field Edge::p
of the new object the compiler used a movdqu
/movaps
pair in the [inlined] constructor code
movdqu 0x0(%rbp), %xmm1 ; read the original object at `rbp`...movaps %xmm1, 0xc0(%rbx) ; store it into the new `Edge` object at `rbx` - crash!
At first, everything seems to be clear here: the memory is not properly aligned, movaps
crashes. My fault.
But is it?
Attempting to reproduce the problem on Godbolt I observe that GCC 8 actually attempts to handle it fairly intelligently. When it is sure that the memory is properly aligned it uses movaps
, just like in my code. This
#include <new>#include <cstdlib>struct P { unsigned long long x, y; };unsigned char buffer[sizeof(P) * 100];void *alloc(){ return buffer;}void foo(const P& s){ void *raw = alloc(); new (raw) P(s);}
results in this
foo(P const&): movdqu xmm0, XMMWORD PTR [rsi] movaps XMMWORD PTR buffer[rip], xmm0 ret
But when it is not sure, it uses movups
. E.g. if I "hide" the definition of the allocator in the above example, it will opt for movups
in the same code
foo(P const&): push rbx mov rbx, rdi call alloc() movdqu xmm0, XMMWORD PTR [rbx] movups XMMWORD PTR [rax], xmm0 pop rbx ret
So, if it is supposed to behave that way, why is it using movaps
in the software I mentioned at the beginning of this post? In my case the implementation of my_custom_allocator()
is not visible to the compiler at the point of the call, which is why I'd expect GCC to opt for movups
.
What are the other factors that might be at play here? Is it a bug in GCC? How can I force GCC to use movups
, preferably everywhere?