![]() |
|
|
#45 |
|
Jan 2005
Caught in a sieve
5·79 Posts |
Yeah...but...then the problem with "index conflict with x[]" pops up again. As you noted, wrapindex tends to equal N-1 (though I'm by no means sure it always will). So that means two different threads may access x[INT(N-1)].
So I was trying to set things up so that the thread from cuda_normalize3_kernel that would access, in this case, x[INT(N-1)] would first run the code from cuda_normalize2_kernel that would also access x[INT(N-1)]. See? |
|
|
|
|
|
#46 | |
|
Jul 2009
Tokyo
2·5·61 Posts |
Quote:
never j value =INT(N-1). |
|
|
|
|
|
|
#47 | |
|
Jul 2009
Tokyo
2×5×61 Posts |
Now.First Wanted.Another 64bit linux bug.
On 32bit linux is correct. Quote:
|
|
|
|
|
|
|
#48 |
|
Jul 2009
Tokyo
26216 Posts |
reduce memory access.
$ time ./llrCUDA -q5*2^1282755+1 -d Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3 5*2^1282755+1 is prime! Time : 1008.943 sec. real 16m49.589s user 7m52.300s sys 5m48.820s |
|
|
|
|
|
#49 |
|
Jan 2005
Caught in a sieve
5×79 Posts |
V0.16, about 3% faster than 0.15.
5*2^1282755+1 is prime! Time : 911.813 sec. And about 0.7 ms/bit! ![]() Changes: - Everted if statement with carry clause. - Merged cuda_normalize2_kernel into the two cuda_normalize1_kernels. - Fixed flag error check - or, not and. - Used flag error check only on the error checking cases. This is how Jean Penne did it. - "flag" is now returned as a special err value, as it's an exceptional case. This also saves bandwidth. |
|
|
|
|
|
#50 |
|
Jul 2009
Tokyo
2·5·61 Posts |
Hi ,Ken_g6
Great !
|
|
|
|
|
|
#51 | |
|
Jul 2009
Tokyo
2·5·61 Posts |
Quote:
|
|
|
|
|
|
|
#52 |
|
May 2004
FRANCE
24·3·13 Posts |
|
|
|
|
|
|
#53 | |
|
Jul 2009
Tokyo
61010 Posts |
Quote:
But don't work. Code:
2716 void
2717 addsignal(
2718 giant x,
2719 double *z,
2720 int n
2721 )
2722 {
2723 register int j, k, m, car, last;
2724 register double f, g,err;
2725
2726 maxFFTerror = 0;
2727 last = 0;
2728 for (j=0;j<n;j++)
2729 {
2730 f = gfloor(z[j]+0.5);
2731 if(f != 0.0) last = j;
2732 if (checkFFTerror)
2733 {
2734 err = fabs(f - z[j]);
2735 if (err > maxFFTerror)
2736 maxFFTerror = err;
2737 }
2738 z[j] =0;
2739 k = 0;
2740 do
2741 {
2742 g = gfloor(f*TWOM16);
2743 z[j+k] += f-g*TWO16;
2744 ++k;
2745 f=g;
2746 } while(f != 0.0);
"TWO16" is 65536. giant use 32bit int,each value < 16bit,when large prime test, this value is overflow. Last fiddled with by msft on 2011-01-13 at 06:46 |
|
|
|
|
|
|
#54 | |
|
May 2004
FRANCE
24×3×13 Posts |
Quote:
Perhaps you may use setmulmode( ) at init. to disallow giants.c to use FFT to do its operations (you don't need that because giants is used only to do conversions...) . Here are the parameter values you may use in setmulmode : #define AUTO_MUL 0 #define GRAMMAR_MUL 1 #define FFT_MUL 2 #define KARAT_MUL 3 I hope it can resolve your overflow problem... Jean |
|
|
|
|
|
|
#55 |
|
Jul 2009
Tokyo
2·5·61 Posts |
Code:
2740 do
2741 {
2742 g = gfloor(f*TWOM16);
2743 z[j+k] += f-g*TWO16;
2744 ++k;
2745 f=g;
2746 } while(f != 0.0);
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| LLRcuda | shanecruise | Riesel Prime Search | 8 | 2014-09-16 02:09 |
| LLRCUDA - getting it to work | diep | GPU Computing | 1 | 2013-10-02 12:12 |