gcc optimization better at -O0 than -O3

I recently made some vector-code and an appropriate godbolt example.

typedef float v8f __attribute__((vector_size(32)));typedef unsigned v8u __attribute__((vector_size(32)));v8f f(register v8f x){  return __builtin_shuffle(x, (v8f){0}, (v8u){1, 2, 3, 4, 5, 6, 7, 8});}

f:        vmovaps ymm1, ymm0        vxorps  xmm0, xmm0, xmm0        vperm2f128      ymm0, ymm1, ymm0, 33        vpalignr        ymm0, ymm0, ymm1, 4        ret

I wanted to see how different optimization (-O0/O1/O2/O3) settings affected the code, and all but -O0 gave identical code. -O0 gave the predictable frame-pointer garbage, and also copies the argument x to a stack local variable for no good reason. To fix this, I added the register storage class specifier:

typedef float v8f __attribute__((vector_size(32)));typedef unsigned v8u __attribute__((vector_size(32)));v8f f(register v8f x){  return __builtin_shuffle(x, (v8f){0}, (v8u){1, 2, 3, 4, 5, 6, 7, 8});}

For -O1/O2/O3, the generated code is identical, but at -O0:

f:        vxorps  xmm1, xmm1, xmm1        vperm2f128      ymm1, ymm0, ymm1, 33        vpalignr        ymm0, ymm1, ymm0, 4        ret

gcc figured out how to avoid a redundant register-copy. While such a copy might be move-eliminated, this still increases code size for no benefit (-Os is bigger than -O0?).

How/why does gcc generate better code for this at -O0 than -O3?

gcc optimization better at -O0 than -O3

Trending Articles

Mp3 Download: Mdu - Mazola

Division 4 ya 29

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

ZARIA CUMMINGS

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Devon police appeal for help to trace missing 13-year-old girl

Subwoofer kukoroma kabla ya kuliwasha!

Henrique & Juliano – Manifesto Musical 2 (Ao Vivo) – EP 3 [iTunes Plus M4A]

Who’s been sentenced at Northampton Magistrates’ Court

NATHAN CARL DAHLIN Arrested by Clackamas County Sheriff's Office on May 15, 2020

The 10 Tennessee Cities With The Largest Black Population For 2021

Essex Police seek Harlow man Joel Steadman

Download EFF Album: 12 –“ASINAMALI”

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Forum Post: RE: Help: ERROR(15053): Can not initialize PSpice UI

Teen drug dealers who avoided jail told by judge to 'make most of lucky escape'

BHUNP TBBP - 3BBB(UNP Renewal)

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Summary of The Schoolboy by William Blake

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20