Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22037

long double (GCC specific) and __float128

$
0
0

I'm looking for detailed information on long double and __float128 in GCC/x86 (more out of curiosity than because of an actual problem).

Few people will probably ever need these (I've just, for the first time ever, truly needed a double), but I guess it is still worthwile (and interesting) to know what you have in your toolbox and what it's about.

In that light, please excuse my somewhat open questions:

  1. Could someone explain the implementation rationale and intended usage of these types, also in comparison of each other? For example, are they "embarrassment implementations" because the standard allows for the type, and someone might complain if they're only just the same precision as double, or are they intended as first-class types?
  2. Alternatively, does someone have a good, usable web reference to share? A Google search on "long double" site:gcc.gnu.org/onlinedocs didn't give me much that's truly useful.
  3. Assuming that the common mantra "if you believe that you need double, you probably don't understand floating point" does not apply, i.e. you really need more precision than just float, and one doesn't care whether 8 or 16 bytes of memory are burnt... is it reasonable to expect that one can as well just jump to long double or __float128 instead of double without a significant performance impact?
  4. The "extended precision" feature of Intel CPUs has historically been source of nasty surprises when values were moved between memory and registers. If actually 96 bits are stored, the long double type should eliminate this issue. On the other hand, I understand that the long double type is mutually exclusive with -mfpmath=sse, as there is no such thing as "extended precision" in SSE. __float128, on the other hand, should work just perfectly fine with SSE math (though in absence of quad precision instructions certainly not on a 1:1 instruction base). Am I right in these assumptions?

(3. and 4. can probably be figured out with some work spent on profiling and disassembling, but maybe someone else had the same thought previously and has already done that work.)

Background (this is the TL;DR part):
I initially stumbled over long double because I was looking up DBL_MAX in <float.h>, and incidentially LDBL_MAX is on the next line. "Oh look, GCC actually has 128 bit doubles, not that I need them, but... cool" was my first thought. Surprise, surprise: sizeof(long double) returns 12... wait, you mean 16?

The C and C++ standards unsurprisingly do not give a very concrete definition of the type. C99 (6.2.5 10) says that the numbers of double are a subset of long double whereas C++03 states (3.9.1 8) that long double has at least as much precision as double (which is the same thing, only worded differently). Basically, the standards leave everything to the implementation, in the same manner as with long, int, and short.

Wikipedia says that GCC uses "80-bit extended precision on x86 processors regardless of the physical storage used".

The GCC documentation states, all on the same page, that the size of the type is 96 bits because of the i386 ABI, but no more than 80 bits of precision are enabled by any option (huh? what?), also Pentium and newer processors want them being aligned as 128 bit numbers. This is the default under 64 bits and can be manually enabled under 32 bits, resulting in 32 bits of zero padding.

Time to run a test:

#include <stdio.h>#include <cfloat>int main(){#ifdef  USE_FLOAT128    typedef __float128  long_double_t;#else    typedef long double long_double_t;#endiflong_double_t ld;int* i = (int*) &ld;i[0] = i[1] = i[2] = i[3] = 0xdeadbeef;for(ld = 0.0000000000000001; ld < LDBL_MAX; ld *= 1.0000001)    printf("%08x-%08x-%08x-%08x\r", i[0], i[1], i[2], i[3]);return 0;}

The output, when using long double, looks somewhat like this, with the marked digits being constant, and all others eventually changing as the numbers get bigger and bigger:

5636666b-c03ef3e0-00223fd8-deadbeef                  ^^       ^^^^^^^^

This suggests that it is not an 80 bit number. An 80-bit number has 18 hex digits. I see 22 hex digits changing, which looks much more like a 96 bits number (24 hex digits). It also isn't a 128 bit number since 0xdeadbeef isn't touched, which is consistent with sizeof returning 12.

The output for __int128 looks like it's really just a 128 bit number. All bits eventually flip.

Compiling with -m128bit-long-double does not align long double to 128 bits with a 32-bit zero padding, as indicated by the documentation. It doesn't use __int128 either, but indeed seems to align to 128 bits, padding with the value 0x7ffdd000(?!).

Further, LDBL_MAX, seems to work as +inf for both long double and __float128. Adding or subtracting a number like 1.0E100 or 1.0E2000 to/from LDBL_MAX results in the same bit pattern.
Up to now, it was my belief that the foo_MAX constants were to hold the largest representable number that is not+inf (apparently that isn't the case?). I'm also not quite sure how an 80-bit number could conceivably act as +inf for a 128 bit value... maybe I'm just too tired at the end of the day and have done something wrong.


Viewing all articles
Browse latest Browse all 22037

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>