Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22072

What are the performance benefits of using GAS over C, and why is C still widely used? [closed]

$
0
0

Why do the main tools that currently exist continue to use C and multiply equally bad languages ​​from it? Why doesn't anyone want to reconsider the very concept of programming?

I did quite a lot of testing and settled on GAS (GNU Assembler). C GCC should use this, but this is a complete lie. GAS is many times faster and safer than C. This allowed me to write the GASS language (GAS Script) in Java, and then rewrite a better compiler in Rust. There is simply no language available for this other than Rust at the moment. So, let's go back to the beginning, why not use GASS if it is many times faster and safer than C? Besides, There is compatibility between languages, you can write assembler inserts and everything is the same as in C, but it’s safe and much faster. This will also make it possible not to improve existing machines, but even better to use existing ones.

** <- this begins the addition to thoughts, we don’t touch it anymore** GASS is just one option that guarantees up to several times more productivity and eliminates leaks.** My current implementation of GASS, which although has errors, but allows you to understand the essence - https://github.com/miruji/GASS

So here it is. What have I tried? As I said above,GAS turns out to be the best assembler currently available for Linux, although it uses older paradigms. C is the only language that can compete with it in this regard, the language on which everything else currently relies. The next thing to do is to write a GAS program and a C program.

Program compiled via GASS on pure GAS:

.section .data# println_0println_0_0:  .string "Hello world!".section .text.globl _start_start:# println_0movl $println_0_0, %ecxcall printlnexit:  movl $1, %eax  # todo: exit func  xorl %ebx, %ebx  # code 0  int $0x80

Implementation of println in gsLib:

# # # # # # ## gass.io.s## # # # # # # # # ## Description#   new line print# Input#   no# Output#   no# Dependencies#   lns#.section .datalns:  .string "\n".section .textln:  pushl %eax  pushl %ebx  pushl %ecx  pushl %edx  movl $4, %eax  movl $1, %ebx  movl $lns, %ecx  movl $1, %edx  int $0x80  popl %edx  popl %ecx  popl %ebx  popl %eaxret# # # # # # # # # # ## Description#   print# Input#   %ecx <-- string# Output#   no# Dependencies#   no#.section .text.global printprint:  pushl %eax  pushl %ebx  pushl %edx  movl $4, %eax  movl $1, %ebx  call strlen  int $0x80  popl %edx  popl %ebx  popl %eaxret# # # # # # # # # # ## Description#   print + new line# Input#   %ecx <-- string# Output#   no# Dependencies#   print()#   ln()#.section .text.global printlnprintln:  call print  call lnret

Linux build platform x86-64, output file size 3.7KiB.

Program compiled on GCC C x86-32 asm, output file size 14.3KiB.gcc -m32 -o gcc-x86-32-asm code.c

int main() {    // AT&T    asm volatile ("movl $13, %%edx;""movl $message, %%ecx;""movl $1, %%ebx;""movl $4, %%eax;""int $0x80;"        :        : "r" ("Hello, World!\n")        : "%eax", "%ebx", "%ecx", "%edx"    );    return 0;}__asm__(".data\n""message:\n"".ascii \"Hello, World!\\n\"");

Program compiled on GCC C x86-64 asm O3, output file size 14.9KiB.gcc -m64 -O3 -no-pie -o gcc-x86-64-asm-O3 code.c

int main() {    asm volatile ("movq $1, %%rax;""movq $1, %%rdi;""lea message(%%rip), %%rsi;""movq $13, %%rdx;""syscall;"        :        :        : "%rax", "%rdi", "%rsi", "%rdx"    );    return 0;}__asm__(".data\n""message:\n"".ascii \"Hello, World!\\n\"");

Program compiled on GCC C x86-64 O3, output file size 15.1KiB.gcc -m64 -O3 -o gcc-x86-64-O3 code.c

#include <stdio.h>int main() {    printf("Hello world!");    return 0;}

And now to the measurements. I will use perf and hyperfine.perf:

Performance counter stats for './gass-x86-64':

    0.19 msec task-clock:u                     #    0.266 CPUs utilized       0      context-switches:u               #    0.000 /sec       0      cpu-migrations:u                 #    0.000 /sec       8      page-faults:u                    #   42.974 K/sec  48,695      cycles:u                         #    0.262 GHz     126      stalled-cycles-frontend:u        #    0.26% frontend cycles idle       0      stalled-cycles-backend:u  45,263      instructions:u                   #    0.93  insn per cycle                                               #    0.00  stalled cycles per insn  13,812      branches:u                       #   74.194 M/sec<not counted> branch-misses:u                      (0.00%)0.000699908 seconds time elapsed

Performance counter stats for './gcc-x86-64-asm-O3':

    0.47 msec task-clock:u                     #    0.468 CPUs utilized       0      context-switches:u               #    0.000 /sec       0      cpu-migrations:u                 #    0.000 /sec      50      page-faults:u                    #  106.662 K/sec 127,088      cycles:u                         #    0.271 GHz                      (52.64%)     508      stalled-cycles-frontend:u        #    0.40% frontend cycles idle  12,742      stalled-cycles-backend:u         #   10.03% backend cycles idle 107,803      instructions:u                   #    0.85  insn per cycle                                               #    0.12  stalled cycles per insn  21,484      branches:u                       #   45.831 M/sec   2,533      branch-misses:u                  #   11.79% of all branches          (47.36%)0.001001527 seconds time elapsed

Performance counter stats for './gcc-x86-64-O3':

    0.50 msec task-clock:u                     #    0.479 CPUs utilized       0      context-switches:u               #    0.000 /sec       0      cpu-migrations:u                 #    0.000 /sec      55      page-faults:u                    #  109.592 K/sec 112,431      cycles:u                         #    0.224 GHz                  (47.22%)   1,038      stalled-cycles-frontend:u        #    0.92% frontend cycles idle  13,652      stalled-cycles-backend:u         #   12.14% backend cycles idle 111,840      instructions:u                   #    0.99  insn per cycle                                               #    0.12  stalled cycles per insn  22,275      branches:u                       #   44.385 M/sec   3,092      branch-misses:u                  #   13.88% of all branches      (52.78%)0.001048027      seconds time elapsed

Performance counter stats for './gcc-x86-32-asm':

    0.53 msec task-clock:u                     #    0.505 CPUs utilized       0      context-switches:u               #    0.000 /sec       0      cpu-migrations:u                 #    0.000 /sec      41      page-faults:u                    #   77.815 K/sec       0      cycles:u                                                             (10.93%)     558      stalled-cycles-frontend:u  18,766      stalled-cycles-backend:u 118,889      instructions:u                                               #    0.16  stalled cycles per insn  25,207      branches:u                       #   47.841 M/sec   2,539      branch-misses:u                  #   10.07% of all branches          (89.07%)0.001043728 seconds time elapsed

hyperfine:Benchmark 1: ./gass-x86-64

  Time (mean ±σ):     212.4 µs ± 186.2 µs    [User: 174.6 µs, System: 325.1 µs]  Range (min … max):     0.0 µs … 3013.3 µs    1282 runs

Benchmark 1: ./gcc-x86-32-asm

  Time (mean ±σ):     484.3 µs ± 212.8 µs    [User: 227.6 µs, System: 591.1 µs]  Range (min … max):   241.9 µs … 3453.6 µs    1144 runs

Benchmark 1: ./gcc-x86-64-asm-O3

  Time (mean ±σ):     378.3 µs ± 246.5 µs    [User: 189.3 µs, System: 543.0 µs]  Range (min … max):   140.2 µs … 3494.5 µs    1229 runs

Benchmark 1: ./gcc-x86-64-O3

  Time (mean ±σ):     419.3 µs ± 239.2 µs    [User: 210.8 µs, System: 582.2 µs]  Range (min … max):   162.3 µs … 3576.2 µs    1195 runs

perf (1000 runs)Performance counter stats for './gass-x86-64' (1000 runs):

     0.16 msec task-clock:u                     #    0.270 CPUs utilized           ( +-  1.91% )        0      context-switches:u               #    0.000 /sec        0      cpu-migrations:u                 #    0.000 /sec        8      page-faults:u                    #   49.888 K/sec                   ( +-  0.04% )   35,376      cycles:u                         #    0.221 GHz                     ( +-  0.92% )      110      stalled-cycles-frontend:u        #    0.31% frontend cycles idle    ( +-  3.40% )    3,036      stalled-cycles-backend:u         #    8.58% backend cycles idle     ( +-  2.48% )   45,263      instructions:u                   #    1.28  insn per cycle                                                #    0.07  stalled cycles per insn ( +-  0.00% )   13,812      branches:u                       #   86.131 M/sec                   ( +-  0.00% )<not counted>      branch-misses:u                                                 ( +- 10.41% )  (0.00%)0.00059383 +- 0.00000493 seconds time elapsed  ( +-  0.83% )

Performance counter stats for './gcc-x86-32-asm' (1000 runs):

     0.42 msec task-clock:u                     #    0.486 CPUs utilized           ( +-  1.12% )        0      context-switches:u               #    0.000 /sec        0      cpu-migrations:u                 #    0.000 /sec       43      page-faults:u                    #  101.720 K/sec                   ( +-  0.12% )  148,786      cycles:u                         #    0.352 GHz                     ( +-  1.09% )      620      stalled-cycles-frontend:u        #    0.42% frontend cycles idle    ( +-  1.69% )    9,422      stalled-cycles-backend:u         #    6.33% backend cycles idle     ( +-  2.61% )  118,893      instructions:u                   #    0.80  insn per cycle                                                #    0.08  stalled cycles per insn ( +-  0.00% )   25,209      branches:u                       #   59.634 M/sec                   ( +-  0.00% )<not counted>      branch-misses:u                                                 ( +-  5.20% )  (0.00%)0.00086995 +- 0.00000699 seconds time elapsed  ( +-  0.80% )

Performance counter stats for './gcc-x86-64-asm-O3' (1000 runs):

     0.37 msec task-clock:u                     #    0.453 CPUs utilized           ( +-  1.47% )        0      context-switches:u               #    0.000 /sec        0      cpu-migrations:u                 #    0.000 /sec       49      page-faults:u                    #  133.347 K/sec                   ( +-  0.06% )  137,464      cycles:u                         #    0.374 GHz                     ( +-  1.03% )      682      stalled-cycles-frontend:u        #    0.50% frontend cycles idle    ( +-  1.90% )    7,053      stalled-cycles-backend:u         #    5.13% backend cycles idle     ( +-  2.62% )  107,806      instructions:u                   #    0.78  insn per cycle                                                #    0.07  stalled cycles per insn ( +-  0.00% )   21,483      branches:u                       #   58.463 M/sec                   ( +-  0.00% )<not counted>      branch-misses:u                                                 ( +-  6.53% )  (0.00%)0.00081057 +- 0.00000617 seconds time elapsed  ( +-  0.76% )

Performance counter stats for './gcc-x86-64-O3' (1000 runs):

     0.40 msec task-clock:u                     #    0.474 CPUs utilized           ( +-  1.92% )        0      context-switches:u               #    0.000 /sec        0      cpu-migrations:u                 #    0.000 /sec       55      page-faults:u                    #  136.525 K/sec                   ( +-  0.06% )  152,656      cycles:u                         #    0.379 GHz                     ( +-  0.93% )      736      stalled-cycles-frontend:u        #    0.48% frontend cycles idle    ( +-  1.96% )    7,560      stalled-cycles-backend:u         #    4.95% backend cycles idle     ( +-  2.63% )  112,884      instructions:u                   #    0.74  insn per cycle                                                #    0.07  stalled cycles per insn ( +-  0.01% )   22,465      branches:u                       #   55.764 M/sec                   ( +-  0.00% )<not counted>      branch-misses:u                                                 ( +-  5.86% )  (0.00%)0.00084920 +- 0.00000869 seconds time elapsed  ( +-  1.02% )

In general, that's all for this post. Perhaps I'm missing something, that's why I decided to share my thoughts in this form. If you use this compiled language, you could write a new python that would be guaranteed to be 10 times faster, according to my measurements up to 20 times maximum, which in principle would be good for an interpreted language.


Viewing all articles
Browse latest Browse all 22072

Trending Articles