![]() |
|
|
#1 |
|
"Mark"
Apr 2003
Between here and the
2×32×353 Posts |
I'm posting this to the group because I cannot figure it out. I working on a sieve for the Primes in π project. The sieve is generic enough that it can be used by similar projects. The problem I'm running into is that the results being returned from the assembler function I wrote are not correct when I use any optimization and I have no idea why. For example on my Mac if I compile with this: gcc -m64 *.S *.c -o pixsieve and run with this: ./pixsieve -L20000 -spi.txt -S10 -P1e6 -m1e9 -iterms.txt -Rterms.pfgw -N, it returns the correct results. (pi.txt is file with the first million decimal digits of pi). If I compile with this: gcc -O1 -m64 *.S *.c -o pixsieve and run with the command line, then I get the wrong results.
In the code it makes a call to an asm routine called pixsieve(). One of the parameters to this routine is an array of values that holds the results of some mulmods. Without any optimization, the former returns the correct values for the mulmods. In the one that doesn't, I get 0 for all of them. I have no idea why optimizing would break the code. I'm hoping that one of the asm gurus on this forum could help me figure out what I am doing wrong. There are some lines of code that show the results from the mulmod. BTW, on Windows, this same code always returns 0s for all mulmods regardless of optimization level. Finally, Ernst did suggest that I inline the code, which would likely solve the problem, but I would like to understand why it fails in the first place before I make such a change. |
|
|
|
|
|
#2 |
|
"Mark"
Apr 2003
Between here and the
2·32·353 Posts |
Although I don't have a solution to this specific problem, I have a different way to continue to using the asm, but not rely on output values from it. If anyone can happen to figure out the root cause, I would appreciate it for my own education.
|
|
|
|
|
|
#3 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
-O1 turns on various optimization flags. Which one causes the problem?
-fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers Does -Wall suggest anything? |
|
|
|
|
|
#4 |
|
Nov 2016
210 Posts |
I think the problem would be that the pixsieve function is declared with __attribute__((pure)) although it modifies memory via the pointer passed as its third argument which pure functions are not allowed to do.
|
|
|
|
|
|
#5 |
|
Nov 2016
2 Posts |
Also, the Windows code stores the fourth argument as a 64-bit value with "mov %r9, mult" while the not-Windows code stores it as a 32-bit value with "mov %ecx, mult", but later it is reloaded as a 64-bit value with "mov mult, %r12" which means the not-Windows code will have junk in the upper 32-bits of the reloaded value.
|
|
|
|