Vectorize __float128 dot product with SIMD/AVX

If I have in C++11 (on Linux, with gcc on Intel Xeon) two __float128* arrays A and B (fixed size, fits entirely in the cache), do you know of/can provide a code that makes the __float128 dot product of those arrays (i.e. the sum of their element-wise product) using SIMD/AVX acceleration where possible.

Unfortunately MKL (and no efficient BLAS library afaik) supports __float128, so this acceleration would reduce somewhat the massive __float128 slowdown versus double to a point where we really can use it.

There are numerical stability reasons to go for __float128 in our case, so less than that is not an option unfortunately.

Vectorize __float128 dot product with SIMD/AVX

Trending Articles

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Notts men wanted over alleged cocaine smuggling plot

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

[R.G. Mechanics] Assassin's Creed IV - Black Flag

Property developer set up cannabis factory to help pay off debts...

Download: Shenky – Ndechilepule ichi ”Prod by Shenky”

Young Qualified Chinese Masseuse Erotic or Authentic

Walkthrough Pokemon Victory Fire Complete | English Language

Mahakal Attitude Status

Moondru Mudichu 07-06-2016 – Polimer tv Serial

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Muloraki Au

£2 million worth of cocaine estimated in supply plot by jailed Grantham men

Neem Baba Extra Questions Answer Class 6 English Poorvi

Practice Sheet of Right form of verbs for HSC Students

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Boyfriend charged with murder of teen footballer

Bas Tum Tak Lyrics Translation (Raanjhnaa/ Raanjhanaa/ Raanjhana)

Missing girl, Jordyn Lyons, 13

The 6 Best Sex Scenes in Nollywood Movies