![]() |
![]() |
#45 |
Jan 2005
Caught in a sieve
5·79 Posts |
![]()
Yeah...but...then the problem with "index conflict with x[]" pops up again. As you noted, wrapindex tends to equal N-1 (though I'm by no means sure it always will). So that means two different threads may access x[INT(N-1)].
So I was trying to set things up so that the thread from cuda_normalize3_kernel that would access, in this case, x[INT(N-1)] would first run the code from cuda_normalize2_kernel that would also access x[INT(N-1)]. See? |
![]() |
![]() |
![]() |
#46 | |
Jul 2009
Tokyo
2×5×61 Posts |
![]() Quote:
never j value =INT(N-1). |
|
![]() |
![]() |
![]() |
#47 | |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Now.First Wanted.Another 64bit linux bug.
On 32bit linux is correct. Quote:
|
|
![]() |
![]() |
![]() |
#48 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
reduce memory access.
$ time ./llrCUDA -q5*2^1282755+1 -d Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3 5*2^1282755+1 is prime! Time : 1008.943 sec. real 16m49.589s user 7m52.300s sys 5m48.820s |
![]() |
![]() |
![]() |
#49 |
Jan 2005
Caught in a sieve
39510 Posts |
![]()
V0.16, about 3% faster than 0.15.
5*2^1282755+1 is prime! Time : 911.813 sec. And about 0.7 ms/bit! ![]() Changes: - Everted if statement with carry clause. - Merged cuda_normalize2_kernel into the two cuda_normalize1_kernels. - Fixed flag error check - or, not and. - Used flag error check only on the error checking cases. This is how Jean Penne did it. - "flag" is now returned as a special err value, as it's an exceptional case. This also saves bandwidth. |
![]() |
![]() |
![]() |
#50 |
Jul 2009
Tokyo
11428 Posts |
![]()
Hi ,Ken_g6
Great ! ![]() |
![]() |
![]() |
![]() |
#51 | |
Jul 2009
Tokyo
11428 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#52 |
May 2004
FRANCE
2×307 Posts |
![]() |
![]() |
![]() |
![]() |
#53 | |
Jul 2009
Tokyo
2·5·61 Posts |
![]() Quote:
But don't work. Code:
2716 void 2717 addsignal( 2718 giant x, 2719 double *z, 2720 int n 2721 ) 2722 { 2723 register int j, k, m, car, last; 2724 register double f, g,err; 2725 2726 maxFFTerror = 0; 2727 last = 0; 2728 for (j=0;j<n;j++) 2729 { 2730 f = gfloor(z[j]+0.5); 2731 if(f != 0.0) last = j; 2732 if (checkFFTerror) 2733 { 2734 err = fabs(f - z[j]); 2735 if (err > maxFFTerror) 2736 maxFFTerror = err; 2737 } 2738 z[j] =0; 2739 k = 0; 2740 do 2741 { 2742 g = gfloor(f*TWOM16); 2743 z[j+k] += f-g*TWO16; 2744 ++k; 2745 f=g; 2746 } while(f != 0.0); "TWO16" is 65536. giant use 32bit int,each value < 16bit,when large prime test, this value is overflow. Last fiddled with by msft on 2011-01-13 at 06:46 |
|
![]() |
![]() |
![]() |
#54 | |
May 2004
FRANCE
2×307 Posts |
![]() Quote:
Perhaps you may use setmulmode( ) at init. to disallow giants.c to use FFT to do its operations (you don't need that because giants is used only to do conversions...) . Here are the parameter values you may use in setmulmode : #define AUTO_MUL 0 #define GRAMMAR_MUL 1 #define FFT_MUL 2 #define KARAT_MUL 3 I hope it can resolve your overflow problem... Jean |
|
![]() |
![]() |
![]() |
#55 |
Jul 2009
Tokyo
2·5·61 Posts |
![]() Code:
2740 do 2741 { 2742 g = gfloor(f*TWOM16); 2743 z[j+k] += f-g*TWO16; 2744 ++k; 2745 f=g; 2746 } while(f != 0.0); |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLRcuda | shanecruise | Riesel Prime Search | 8 | 2014-09-16 02:09 |
LLRCUDA - getting it to work | diep | GPU Computing | 1 | 2013-10-02 12:12 |