Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22288

GCC ARM Performance drop

$
0
0

I stumbled upon very strange issue with GCC. The issue is 25% drop in performance. Here is the story.

I have a pice of software which is fp32 compute intensive (neural networks compiled with TVM). I compile it for ARM (rk3399 device), here is info:

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/5/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-armhf/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-armhf --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-armhf --with-arch-directory=arm --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-multilib --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)

uname -a

Linux FriendlyELEC 4.4.143 #1 SMP Tue Nov 20 11:10:11 CST 2018 aarch64 aarch64 aarch64 GNU/Linux

lscpu

Architecture:          aarch64 
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    3
Socket(s):             2
Model name:            ARMv8 Processor rev 2 (v8l)
CPU max MHz:           1800.0000
CPU min MHz:           408.0000
Hypervisor vendor:     horizontal
Virtualization type:   full

The code was initially "slow" and cpp11, I decided to try cpp17 and cpp14. cpp17 was not supported, but cpp14 was. I switched to cpp14 and voila I got boost around 25% in performance. I really tested it to make sure the boost is in fact real and not a measuring mistake. I had this boost for a week then my device rebooted and the boost in performance was gone!

It may sound crazy, but I'm very sure in my code and measurements I had. I didn't have explicit compile flags prior to this gimmick. Now I'm trying to figure out compile flags for GCC to reclaim what was lost, but I don't have much experience with GCC. What could be the issue here? What flags can affect performance that much?

the code uses .so files, compiled with use of llvm and gcc

llvm -device=arm_cpu -target=armv8l-linux-gnueabihf -mattr=+neon,fp-armv8

Viewing all articles
Browse latest Browse all 22288

Latest Images

Trending Articles



Latest Images