Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22307

uint32_t * uint32_t = uint64_t vector multiplication with gcc

$
0
0

I'm trying to multiply vectors of uint32_t producing the full 64bit result in an uint64_t vector in gcc. The result I expect is for gcc to emit a single VPMULUDQ instruction. But what gcc outputs as code is horrible shuffling around of the individual uint32_t of the source vectors and then a full 64*64=64 multiplication. Here is what I've tried:

#include <stdint.h>

typedef uint32_t v8lu __attribute__ ((vector_size (32)));
typedef uint64_t v4llu __attribute__ ((vector_size (32)));

v4llu mul(v8lu x, v8lu y) {
    x[1] = 0; x[3] = 0; x[5] = 0; x[7] = 0;
    y[1] = 0; y[3] = 0; y[5] = 0; y[7] = 0;
    return (v4llu)x * (v4llu)y;
}

The first masks out the unwanted parts of the uint32_t vector in the hope gcc would optimize away the unneeded parts of the 64*64=64 multiplication and then see the masking is pointless as well. No such luck.

v4llu mul2(v8lu x, v8lu y) {
    v4llu tx = {x[0], x[2], x[4], x[6]};
    v4llu ty = {y[0], y[2], y[4], y[6]};
    return tx * ty;
}

Here I try to create a uint64_t vector from scratch with only the used parts set. Again gcc should see the top 32bit of each uint64_t are 0 and not do a full 64*64=64 multiply. Instead a lot of extracting and putting back of the values happens and a 64*64=64 multiply.

v4llu mul3(v8lu x, v8lu y) {
    v4llu t = {x[0] * y[0], x[2] * y[2], x[4] * y[4], x[6] * y[6]};
    return t;
}

Lets build the result vector by multiplying the parts. Maybe gcc sees that it can use VPMULUDQ to achieve exatly that. No luck, it falls back to 4 IMUL opcodes.

Is there a way to tell gcc what I want it to do (32*32=64 multiplication with everything pefectly placed)?

Note: Inline asm or the intrinsic isn't the answere. Writing the opcode by hand obviously works. I want gcc to understand the problem and produce the right solution.


Viewing all articles
Browse latest Browse all 22307

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>