Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 21994

How to return two 256 bit YMM values in register YMM0/1 in a function (no memory involved)

$
0
0

My goal is to return a 4x4 floating point matrix as a return value of a function without using memory. As pointed out by the Wiki article of the "x86 calling conventions"https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI it is possible to return up to two floating point values from a function using XMM0 and XMM1.

I tried this:

struct Mat4 // just a simple struct for testing
{
    __m256 m0, m1;
};

Mat4 Foo(__m256 m0, __m256 m1, __m256 m2, __m256 m3)
{
    return {m1, m2};
}

But gcc gives me this as the result:

mov     %rdi,%rax
vmovaps %ymm1,(%rdi)
vmovaps %ymm2,0x20(%rdi)
retq 

I was expecting something like this:

vmovaps %ymm1, %ymm0
vmovaps %ymm2, %ymm1
retq

Is there any way to force gcc to return the whole struct Mat4 in just YMM0 and YMM1?


Viewing all articles
Browse latest Browse all 21994

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>