mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-01-11, 05:41   #45
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

39410 Posts
Default

Quote:
Originally Posted by msft View Post
We can merge cuda_normalize2_kernel & cuda_normalize3_kernel.
Yeah...but...then the problem with "index conflict with x[]" pops up again. As you noted, wrapindex tends to equal N-1 (though I'm by no means sure it always will). So that means two different threads may access x[INT(N-1)].

So I was trying to set things up so that the thread from cuda_normalize3_kernel that would access, in this case, x[INT(N-1)] would first run the code from cuda_normalize2_kernel that would also access x[INT(N-1)]. See?
Ken_g6 is offline   Reply With Quote
Old 2011-01-11, 06:03   #46
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Yeah...but...then the problem with "index conflict with x[]" pops up again. As you noted, wrapindex tends to equal N-1 (though I'm by no means sure it always will). So that means two different threads may access x[INT(N-1)].
It is depend carry value.before j =INT(N-1) ,carry=0 and exit loop.
never j value =INT(N-1).
msft is offline   Reply With Quote
Old 2011-01-11, 08:13   #47
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Now.First Wanted.Another 64bit linux bug.
On 32bit linux is correct.
Quote:
$unzip llrpisrc.zip
$cd llrpi/linuxllr
llrpi/linuxllr$make
llrpi/linuxllr$ ./llrpi -q7*2^3015762+1
Segmentation fault
msft is offline   Reply With Quote
Old 2011-01-12, 08:36   #48
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

reduce memory access.
$ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 1008.943 sec.

real 16m49.589s
user 7m52.300s
sys 5m48.820s
Attached Files
File Type: gz llrCUDA.0.15.tar.gz (94.2 KB, 118 views)
msft is offline   Reply With Quote
Old 2011-01-13, 01:44   #49
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

2·197 Posts
Default

V0.16, about 3% faster than 0.15.

5*2^1282755+1 is prime! Time : 911.813 sec.
And about 0.7 ms/bit!

Changes:

- Everted if statement with carry clause.
- Merged cuda_normalize2_kernel into the two cuda_normalize1_kernels.
- Fixed flag error check - or, not and.
- Used flag error check only on the error checking cases. This is how Jean Penne did it.
- "flag" is now returned as a special err value, as it's an exceptional case. This also saves bandwidth.
Attached Files
File Type: bz2 llrcuda.0.16.tar.bz2 (87.7 KB, 100 views)
Ken_g6 is offline   Reply With Quote
Old 2011-01-13, 02:04   #50
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi ,Ken_g6
Great !
msft is offline   Reply With Quote
Old 2011-01-13, 06:13   #51
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
$unzip llrpisrc.zip
$cd llrpi/linuxllr
llrpi/linuxllr$make
llrpi/linuxllr$ ./llrpi -q7*2^3015762+1
Segmentation fault
giant.c use 16bit length array,when large prime test,it is overflow.
msft is offline   Reply With Quote
Old 2011-01-13, 06:26   #52
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

10578 Posts
Default giants.c problem...

Quote:
Originally Posted by msft View Post
giant.c use 16bit length array,when large prime test,it is overflow.
You may increase MAX_SHORTS in giants.h ; 1<<24 seems to work.
I am very happy to follow your successes!
Jean
Jean Penné is offline   Reply With Quote
Old 2011-01-13, 06:43   #53
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
You may increase MAX_SHORTS in giants.h ; 1<<24 seems to work.
I am very happy to follow your successes!
Jean
Thank you very match.
But don't work.
Code:
2716 void
2717 addsignal(
2718         giant                           x,
2719         double                          *z,
2720         int                             n
2721 )
2722 {
2723         register int            j, k, m, car, last;
2724         register double         f, g,err;
2725 
2726         maxFFTerror = 0;
2727     last = 0;
2728         for (j=0;j<n;j++)
2729         {
2730                 f = gfloor(z[j]+0.5);
2731         if(f != 0.0) last = j;
2732                 if (checkFFTerror)
2733                 {
2734                         err = fabs(f - z[j]);
2735                         if (err > maxFFTerror)
2736                                 maxFFTerror = err;
2737                 }
2738                 z[j] =0;
2739                 k = 0;
2740                 do
2741                 {
2742                         g = gfloor(f*TWOM16);
2743                         z[j+k] += f-g*TWO16;
2744                         ++k;
2745                         f=g;
2746                 } while(f != 0.0);
when z[j] < -0.5 can not exit this while.
"TWO16" is 65536.
giant use 32bit int,each value < 16bit,when large prime test, this value is overflow.

Last fiddled with by msft on 2011-01-13 at 06:46
msft is offline   Reply With Quote
Old 2011-01-13, 07:35   #54
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

13×43 Posts
Default Perhaps setmulmode...

Quote:
Originally Posted by msft View Post
Thank you very match.
But don't work.
Code:
2716 void
2717 addsignal(
2718         giant                           x,
2719         double                          *z,
2720         int                             n
2721 )
2722 {
2723         register int            j, k, m, car, last;
2724         register double         f, g,err;
2725 
2726         maxFFTerror = 0;
2727     last = 0;
2728         for (j=0;j<n;j++)
2729         {
2730                 f = gfloor(z[j]+0.5);
2731         if(f != 0.0) last = j;
2732                 if (checkFFTerror)
2733                 {
2734                         err = fabs(f - z[j]);
2735                         if (err > maxFFTerror)
2736                                 maxFFTerror = err;
2737                 }
2738                 z[j] =0;
2739                 k = 0;
2740                 do
2741                 {
2742                         g = gfloor(f*TWOM16);
2743                         z[j+k] += f-g*TWO16;
2744                         ++k;
2745                         f=g;
2746                 } while(f != 0.0);
when z[j] < -0.5 can not exit this while.
"TWO16" is 65536.
giant use 32bit int,each value < 16bit,when large prime test, this value is overflow.

Perhaps you may use setmulmode( ) at init. to disallow giants.c to use
FFT to do its operations (you don't need that because giants is used
only to do conversions...) .
Here are the parameter values you may use in setmulmode :

#define AUTO_MUL 0
#define GRAMMAR_MUL 1
#define FFT_MUL 2
#define KARAT_MUL 3

I hope it can resolve your overflow problem...
Jean
Jean Penné is offline   Reply With Quote
Old 2011-01-13, 07:36   #55
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Code:
2740                 do
2741                 {
2742                         g = gfloor(f*TWOM16);
2743                         z[j+k] += f-g*TWO16;
2744                         ++k;
2745                         f=g;
2746                 } while(f != 0.0);
Sorry, I understand this loop's mean.
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRcuda shanecruise Riesel Prime Search 8 2014-09-16 02:09
LLRCUDA - getting it to work diep GPU Computing 1 2013-10-02 12:12

All times are UTC. The time now is 04:51.

Mon Jul 13 04:51:46 UTC 2020 up 110 days, 2:24, 0 users, load averages: 1.76, 1.93, 1.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.