20110111, 05:41  #45 
Jan 2005
Caught in a sieve
394_{10} Posts 
Yeah...but...then the problem with "index conflict with x[]" pops up again. As you noted, wrapindex tends to equal N1 (though I'm by no means sure it always will). So that means two different threads may access x[INT(N1)].
So I was trying to set things up so that the thread from cuda_normalize3_kernel that would access, in this case, x[INT(N1)] would first run the code from cuda_normalize2_kernel that would also access x[INT(N1)]. See? 
20110111, 06:03  #46  
Jul 2009
Tokyo
610_{10} Posts 
Quote:
never j value =INT(N1). 

20110111, 08:13  #47  
Jul 2009
Tokyo
2×5×61 Posts 
Now.First Wanted.Another 64bit linux bug.
On 32bit linux is correct. Quote:


20110112, 08:36  #48 
Jul 2009
Tokyo
2×5×61 Posts 
reduce memory access.
$ time ./llrCUDA q5*2^1282755+1 d Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3 5*2^1282755+1 is prime! Time : 1008.943 sec. real 16m49.589s user 7m52.300s sys 5m48.820s 
20110113, 01:44  #49 
Jan 2005
Caught in a sieve
2·197 Posts 
V0.16, about 3% faster than 0.15.
5*2^1282755+1 is prime! Time : 911.813 sec. And about 0.7 ms/bit! Changes:  Everted if statement with carry clause.  Merged cuda_normalize2_kernel into the two cuda_normalize1_kernels.  Fixed flag error check  or, not and.  Used flag error check only on the error checking cases. This is how Jean Penne did it.  "flag" is now returned as a special err value, as it's an exceptional case. This also saves bandwidth. 
20110113, 02:04  #50 
Jul 2009
Tokyo
2×5×61 Posts 
Hi ,Ken_g6
Great ! 
20110113, 06:13  #51  
Jul 2009
Tokyo
2×5×61 Posts 
Quote:


20110113, 06:26  #52 
May 2004
FRANCE
1057_{8} Posts 
giants.c problem...

20110113, 06:43  #53  
Jul 2009
Tokyo
2·5·61 Posts 
Quote:
But don't work. Code:
2716 void 2717 addsignal( 2718 giant x, 2719 double *z, 2720 int n 2721 ) 2722 { 2723 register int j, k, m, car, last; 2724 register double f, g,err; 2725 2726 maxFFTerror = 0; 2727 last = 0; 2728 for (j=0;j<n;j++) 2729 { 2730 f = gfloor(z[j]+0.5); 2731 if(f != 0.0) last = j; 2732 if (checkFFTerror) 2733 { 2734 err = fabs(f  z[j]); 2735 if (err > maxFFTerror) 2736 maxFFTerror = err; 2737 } 2738 z[j] =0; 2739 k = 0; 2740 do 2741 { 2742 g = gfloor(f*TWOM16); 2743 z[j+k] += fg*TWO16; 2744 ++k; 2745 f=g; 2746 } while(f != 0.0); "TWO16" is 65536. giant use 32bit int,each value < 16bit,when large prime test, this value is overflow. Last fiddled with by msft on 20110113 at 06:46 

20110113, 07:35  #54  
May 2004
FRANCE
13×43 Posts 
Perhaps setmulmode...
Quote:
Perhaps you may use setmulmode( ) at init. to disallow giants.c to use FFT to do its operations (you don't need that because giants is used only to do conversions...) . Here are the parameter values you may use in setmulmode : #define AUTO_MUL 0 #define GRAMMAR_MUL 1 #define FFT_MUL 2 #define KARAT_MUL 3 I hope it can resolve your overflow problem... Jean 

20110113, 07:36  #55 
Jul 2009
Tokyo
2·5·61 Posts 
Code:
2740 do 2741 { 2742 g = gfloor(f*TWOM16); 2743 z[j+k] += fg*TWO16; 2744 ++k; 2745 f=g; 2746 } while(f != 0.0); 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
LLRcuda  shanecruise  Riesel Prime Search  8  20140916 02:09 
LLRCUDA  getting it to work  diep  GPU Computing  1  20131002 12:12 