Why do the main tools that currently exist continue to use C and multiply equally bad languages from it? Why doesn't anyone want to reconsider the very concept of programming?
I did quite a lot of testing and settled on GAS (GNU Assembler). C GCC should use this, but this is a complete lie. GAS is many times faster and safer than C. This allowed me to write the GASS language (GAS Script) in Java, and then rewrite a better compiler in Rust. There is simply no language available for this other than Rust at the moment. So, let's go back to the beginning, why not use GASS if it is many times faster and safer than C? Besides, There is compatibility between languages, you can write assembler inserts and everything is the same as in C, but it’s safe and much faster. This will also make it possible not to improve existing machines, but even better to use existing ones.
** <- this begins the addition to thoughts, we don’t touch it anymore** GASS is just one option that guarantees up to several times more productivity and eliminates leaks.** My current implementation of GASS, which although has errors, but allows you to understand the essence - https://github.com/miruji/GASS
So here it is. What have I tried? As I said above,GAS turns out to be the best assembler currently available for Linux, although it uses older paradigms. C is the only language that can compete with it in this regard, the language on which everything else currently relies. The next thing to do is to write a GAS program and a C program.
Program compiled via GASS on pure GAS:
.section .data# println_0println_0_0: .string "Hello world!".section .text.globl _start_start:# println_0movl $println_0_0, %ecxcall printlnexit: movl $1, %eax # todo: exit func xorl %ebx, %ebx # code 0 int $0x80
Implementation of println in gsLib:
# # # # # # ## gass.io.s## # # # # # # # # ## Description# new line print# Input# no# Output# no# Dependencies# lns#.section .datalns: .string "\n".section .textln: pushl %eax pushl %ebx pushl %ecx pushl %edx movl $4, %eax movl $1, %ebx movl $lns, %ecx movl $1, %edx int $0x80 popl %edx popl %ecx popl %ebx popl %eaxret# # # # # # # # # # ## Description# print# Input# %ecx <-- string# Output# no# Dependencies# no#.section .text.global printprint: pushl %eax pushl %ebx pushl %edx movl $4, %eax movl $1, %ebx call strlen int $0x80 popl %edx popl %ebx popl %eaxret# # # # # # # # # # ## Description# print + new line# Input# %ecx <-- string# Output# no# Dependencies# print()# ln()#.section .text.global printlnprintln: call print call lnret
Linux build platform x86-64, output file size 3.7KiB.
Program compiled on GCC C x86-32 asm, output file size 14.3KiB.gcc -m32 -o gcc-x86-32-asm code.c
int main() { // AT&T asm volatile ("movl $13, %%edx;""movl $message, %%ecx;""movl $1, %%ebx;""movl $4, %%eax;""int $0x80;" : : "r" ("Hello, World!\n") : "%eax", "%ebx", "%ecx", "%edx" ); return 0;}__asm__(".data\n""message:\n"".ascii \"Hello, World!\\n\"");
Program compiled on GCC C x86-64 asm O3, output file size 14.9KiB.gcc -m64 -O3 -no-pie -o gcc-x86-64-asm-O3 code.c
int main() { asm volatile ("movq $1, %%rax;""movq $1, %%rdi;""lea message(%%rip), %%rsi;""movq $13, %%rdx;""syscall;" : : : "%rax", "%rdi", "%rsi", "%rdx" ); return 0;}__asm__(".data\n""message:\n"".ascii \"Hello, World!\\n\"");
Program compiled on GCC C x86-64 O3, output file size 15.1KiB.gcc -m64 -O3 -o gcc-x86-64-O3 code.c
#include <stdio.h>int main() { printf("Hello world!"); return 0;}
And now to the measurements. I will use perf and hyperfine.perf:
Performance counter stats for './gass-x86-64':
0.19 msec task-clock:u # 0.266 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 8 page-faults:u # 42.974 K/sec 48,695 cycles:u # 0.262 GHz 126 stalled-cycles-frontend:u # 0.26% frontend cycles idle 0 stalled-cycles-backend:u 45,263 instructions:u # 0.93 insn per cycle # 0.00 stalled cycles per insn 13,812 branches:u # 74.194 M/sec<not counted> branch-misses:u (0.00%)0.000699908 seconds time elapsed
Performance counter stats for './gcc-x86-64-asm-O3':
0.47 msec task-clock:u # 0.468 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 50 page-faults:u # 106.662 K/sec 127,088 cycles:u # 0.271 GHz (52.64%) 508 stalled-cycles-frontend:u # 0.40% frontend cycles idle 12,742 stalled-cycles-backend:u # 10.03% backend cycles idle 107,803 instructions:u # 0.85 insn per cycle # 0.12 stalled cycles per insn 21,484 branches:u # 45.831 M/sec 2,533 branch-misses:u # 11.79% of all branches (47.36%)0.001001527 seconds time elapsed
Performance counter stats for './gcc-x86-64-O3':
0.50 msec task-clock:u # 0.479 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 55 page-faults:u # 109.592 K/sec 112,431 cycles:u # 0.224 GHz (47.22%) 1,038 stalled-cycles-frontend:u # 0.92% frontend cycles idle 13,652 stalled-cycles-backend:u # 12.14% backend cycles idle 111,840 instructions:u # 0.99 insn per cycle # 0.12 stalled cycles per insn 22,275 branches:u # 44.385 M/sec 3,092 branch-misses:u # 13.88% of all branches (52.78%)0.001048027 seconds time elapsed
Performance counter stats for './gcc-x86-32-asm':
0.53 msec task-clock:u # 0.505 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 41 page-faults:u # 77.815 K/sec 0 cycles:u (10.93%) 558 stalled-cycles-frontend:u 18,766 stalled-cycles-backend:u 118,889 instructions:u # 0.16 stalled cycles per insn 25,207 branches:u # 47.841 M/sec 2,539 branch-misses:u # 10.07% of all branches (89.07%)0.001043728 seconds time elapsed
hyperfine:Benchmark 1: ./gass-x86-64
Time (mean ±σ): 212.4 µs ± 186.2 µs [User: 174.6 µs, System: 325.1 µs] Range (min … max): 0.0 µs … 3013.3 µs 1282 runs
Benchmark 1: ./gcc-x86-32-asm
Time (mean ±σ): 484.3 µs ± 212.8 µs [User: 227.6 µs, System: 591.1 µs] Range (min … max): 241.9 µs … 3453.6 µs 1144 runs
Benchmark 1: ./gcc-x86-64-asm-O3
Time (mean ±σ): 378.3 µs ± 246.5 µs [User: 189.3 µs, System: 543.0 µs] Range (min … max): 140.2 µs … 3494.5 µs 1229 runs
Benchmark 1: ./gcc-x86-64-O3
Time (mean ±σ): 419.3 µs ± 239.2 µs [User: 210.8 µs, System: 582.2 µs] Range (min … max): 162.3 µs … 3576.2 µs 1195 runs
perf (1000 runs)Performance counter stats for './gass-x86-64' (1000 runs):
0.16 msec task-clock:u # 0.270 CPUs utilized ( +- 1.91% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 8 page-faults:u # 49.888 K/sec ( +- 0.04% ) 35,376 cycles:u # 0.221 GHz ( +- 0.92% ) 110 stalled-cycles-frontend:u # 0.31% frontend cycles idle ( +- 3.40% ) 3,036 stalled-cycles-backend:u # 8.58% backend cycles idle ( +- 2.48% ) 45,263 instructions:u # 1.28 insn per cycle # 0.07 stalled cycles per insn ( +- 0.00% ) 13,812 branches:u # 86.131 M/sec ( +- 0.00% )<not counted> branch-misses:u ( +- 10.41% ) (0.00%)0.00059383 +- 0.00000493 seconds time elapsed ( +- 0.83% )
Performance counter stats for './gcc-x86-32-asm' (1000 runs):
0.42 msec task-clock:u # 0.486 CPUs utilized ( +- 1.12% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 43 page-faults:u # 101.720 K/sec ( +- 0.12% ) 148,786 cycles:u # 0.352 GHz ( +- 1.09% ) 620 stalled-cycles-frontend:u # 0.42% frontend cycles idle ( +- 1.69% ) 9,422 stalled-cycles-backend:u # 6.33% backend cycles idle ( +- 2.61% ) 118,893 instructions:u # 0.80 insn per cycle # 0.08 stalled cycles per insn ( +- 0.00% ) 25,209 branches:u # 59.634 M/sec ( +- 0.00% )<not counted> branch-misses:u ( +- 5.20% ) (0.00%)0.00086995 +- 0.00000699 seconds time elapsed ( +- 0.80% )
Performance counter stats for './gcc-x86-64-asm-O3' (1000 runs):
0.37 msec task-clock:u # 0.453 CPUs utilized ( +- 1.47% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 49 page-faults:u # 133.347 K/sec ( +- 0.06% ) 137,464 cycles:u # 0.374 GHz ( +- 1.03% ) 682 stalled-cycles-frontend:u # 0.50% frontend cycles idle ( +- 1.90% ) 7,053 stalled-cycles-backend:u # 5.13% backend cycles idle ( +- 2.62% ) 107,806 instructions:u # 0.78 insn per cycle # 0.07 stalled cycles per insn ( +- 0.00% ) 21,483 branches:u # 58.463 M/sec ( +- 0.00% )<not counted> branch-misses:u ( +- 6.53% ) (0.00%)0.00081057 +- 0.00000617 seconds time elapsed ( +- 0.76% )
Performance counter stats for './gcc-x86-64-O3' (1000 runs):
0.40 msec task-clock:u # 0.474 CPUs utilized ( +- 1.92% ) 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 55 page-faults:u # 136.525 K/sec ( +- 0.06% ) 152,656 cycles:u # 0.379 GHz ( +- 0.93% ) 736 stalled-cycles-frontend:u # 0.48% frontend cycles idle ( +- 1.96% ) 7,560 stalled-cycles-backend:u # 4.95% backend cycles idle ( +- 2.63% ) 112,884 instructions:u # 0.74 insn per cycle # 0.07 stalled cycles per insn ( +- 0.01% ) 22,465 branches:u # 55.764 M/sec ( +- 0.00% )<not counted> branch-misses:u ( +- 5.86% ) (0.00%)0.00084920 +- 0.00000869 seconds time elapsed ( +- 1.02% )
In general, that's all for this post. Perhaps I'm missing something, that's why I decided to share my thoughts in this form. If you use this compiled language, you could write a new python that would be guaranteed to be 10 times faster, according to my measurements up to 20 times maximum, which in principle would be good for an interpreted language.