Quantcast
Channel: Active questions tagged gcc - Stack Overflow
Viewing all articles
Browse latest Browse all 22004

One thread counting, other thread does a job and measurement

$
0
0

I would like to implement a 2 thread model where 1 is counting (infinitely increment a value) and the other one is recording the first counter, do the job, record the second recording and measure the time elapsed between.

Here is what I have done so far:

// global counter
register unsigned long counter asm("r13");
// unsigned long counter;

void* counter_thread(){
    // affinity is set to some isolated CPU so the noise will be minimal

    while(1){
        //counter++; // Line 1*
        asm volatile("add $1, %0" : "+r"(counter) : ); // Line 2*
    }
}

void* measurement_thread(){
    // affinity is set somewhere over here
    unsigned long meas = 0;
    unsigned long a = 5;
    unsigned long r1,r2;
    sleep(1.0);
    while(1){
        mfence();
        r1 = counter;
        a *=3; // dummy operation that I want to measure
        r2 = counter;
        mfence();
        meas = r2-r1;
        printf("counter:%ld \n", counter);
        break;
    }
}

Let me explain what I have done so far:

Since I want the counter to be accurate, I am setting the affinity to an isolated CPU. Also, If I use the counter in Line 1*, the dissassambled function will be:

 d4c:   4c 89 e8                mov    %r13,%rax
 d4f:   48 83 c0 01             add    $0x1,%rax
 d53:   49 89 c5                mov    %rax,%r13
 d56:   eb f4                   jmp    d4c <counter_thread+0x37>

Which is not 1 cycle operation. That is why I have used inline assembly to decrease 2 mov instructions. Using the inline assembly:

 d4c:   49 83 c5 01             add    $0x1,%r13
 d50:   eb fa                   jmp    d4c <counter_thread+0x37>

But the thing is, both implementations are not working. The other thread cannot see the counter being updated. If I make the global counter value not a register, then it is working, but I want to be precise. If I make global counter value to unsigned long counter then the disassembled code of counter thread is:

 d4c:   48 8b 05 ed 12 20 00    mov    0x2012ed(%rip),%rax        # 202040 <counter>
 d53:   48 83 c0 01             add    $0x1,%rax
 d57:   48 89 05 e2 12 20 00    mov    %rax,0x2012e2(%rip)        # 202040 <counter>
 d5e:   eb ec                   jmp    d4c <counter_thread+0x37>

It works but it doesn't give me the granularity that I want.

EDIT:

My environment:

  • CPU: AMD Ryzen 3600
  • kernel: 5.0.0-32-generic
  • OS: Ubuntu 18.04

EDIT2: I have isolated 2 neighbor CPU cores (i.e. core 10 and 11) and running the experiment on those cores. The counter is on one of the cores, measurement is on the other. Isolation is done by using /etc/default/grub file and adding isolcpus line.

EDIT3: I know that one measurement is not enough. I have run the experiment 10 million times and looked at the results.

Experiment1: Setup:

unsigned long counter =0;//global counter 
void* counter_thread(){
    mfence();
    while(1)
        counter++;
}
void* measurement_thread(){
    unsigned long i=0, r1=0,r2=0;
    unsigned int a=0;
    sleep(1.0);
    while(1){
        mfence();
        r1 = counter;
        a +=3;
        r2 = counter;
        mfence();
        measurements[r2-r1]++;
        i++;
        if(i == MILLION_ITER)
            break;   
    }
}

Results1: In 99.99% I got 0. Which I expect because either first thread is not running, or OS or other interrupts disturb the measurement. Getting rid of the 0's and very high values gives me 20 cycles of measurement on the average. (I was expecting 3-4 because I only do an integer addition).

Experiment2:

Setup: Identically the same as above, one difference is, instead of global counter, I use the counter as register:

register unsigned long counter asm("r13");

Results2: Measurement thread always reads 0. In disassembled code, I can see that both are dealing with R13 register (counter), however, I believe that it is not somehow shared.

Experiment3:

Setup: Identical to the setup2, except in the counter thread, instead of doing counter++, I am doing an inline assembly to make sure that I am doing 1 cycle operation. My disassembled file looks like this:

 cd1:   49 83 c5 01             add    $0x1,%r13
 cd5:   eb fa                   jmp    cd1 <counter_thread+0x37>

Results3: Measurement thread reads 0 as above.


Viewing all articles
Browse latest Browse all 22004

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>