Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 21994

GCC for x86: Optimize sum of two couples of floats

$
0
0

I am compiling the code below, with optimization, and it still looks like there would be a more efficient way of performing the two sums using SIMD capability of the underlying hardware. What would be the right mix of flags for GCC to generate assembly that loads operands in couples and executes the two additions in a single instruction?

#include <iostream>

struct foo {
    float val[2];

    foo(float a, float b)
    {
        val[0] = a;
        val[1] = b;
    }

    foo& operator+=(
            const foo &rhs)
    {
        val[0] += rhs.val[0];
        val[1] += rhs.val[1];
        return *this;
    }
};


int main(void)
{
    volatile float values[] = { 2.0, 3.0, 4.0, 7.0 };
    foo first(values[0], values[1]);
    foo second(values[2], values[3]);

    second += first;

    std::cout << "("<< second.val[0] << ","<< second.val[1] << ")"<< std::endl;

    return 1;
}

The assembly code that is generated looks like this (for the operator+() alone), where seems pretty apparent that all operands are treated individually.

  400712:   c5 fa 10 4c 24 14       vmovss 0x14(%rsp),%xmm1
  400718:   c5 fa 10 5c 24 10       vmovss 0x10(%rsp),%xmm3
    foo second(values[2], values[3]);
  40071e:   c5 fa 10 44 24 1c       vmovss 0x1c(%rsp),%xmm0
  400724:   c5 fa 10 54 24 18       vmovss 0x18(%rsp),%xmm2
        val[1] += rhs.val[1];
  40072a:   c5 f2 58 e8             vaddss %xmm0,%xmm1,%xmm5
        val[0] += rhs.val[0];
  40072e:   c5 e2 58 e2             vaddss %xmm2,%xmm3,%xmm4
        val[1] += rhs.val[1];
  400732:   c5 fa 11 6c 24 0c       vmovss %xmm5,0xc(%rsp)
        val[0] += rhs.val[0];
  400738:   c5 fa 11 64 24 08       vmovss %xmm4,0x8(%rsp)

I compile with this command (but removing -mavx2 does not change the result much):

g++ -O3 -mavx2 -g -std=c++11 main.cpp -o run

In case it matters, this is GCC 6.3 (and do not really have the freedom to upgrade).


Viewing all articles
Browse latest Browse all 21994

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>