mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-09-01, 02:47   #738
c10ck3r
 
c10ck3r's Avatar
 
Aug 2010
Kansas

54710 Posts
Default

Quote:
Originally Posted by kriesel View Post
Sweet. You're welcome. What size exponents do you plan to run? See
https://www.mersenneforum.org/showth...365#post489365 and following posts for an idea of exponent limits on other gpu models.
Please provide any success or failure info versus exponent sizes tried, and I'll add it.
Also whether your GTX1050 a 2GB or 3GB unit.
Tested it out with a 94M exponent, B1=855k, B2=~18.2M, e=2. 2GB model. 9.4451 Ghz-Days, don't have an exact duration, but at the beginning it gave ETA of 6.75-7.5hrs, not sure if that was Stage 1 ETA only.
I did, however, find out that I can't run either mfaktc or CUDAPm1 at the same time as 4x P-1 in P95, even though I've got 32GB RAM and a Threadripper. Froze my computer within about 5 minutes of starting with both programs. Yikes lol


Edit: Is it possible to do Stage 1 only with CUDAPm1 and then perform Stage 2 in P95? Just thinking since I have quite a bit of RAM available, this could allow me to focus on some deeper runs of P-1

Last fiddled with by c10ck3r on 2019-09-01 at 02:52
c10ck3r is offline   Reply With Quote
Old 2019-09-01, 12:29   #739
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by c10ck3r View Post
Tested it out with a 94M exponent, B1=855k, B2=~18.2M, e=2. 2GB model. 9.4451 Ghz-Days, don't have an exact duration, but at the beginning it gave ETA of 6.75-7.5hrs, not sure if that was Stage 1 ETA only.
I did, however, find out that I can't run either mfaktc or CUDAPm1 at the same time as 4x P-1 in P95, even though I've got 32GB RAM and a Threadripper. Froze my computer within about 5 minutes of starting with both programs. Yikes lol

Edit: Is it possible to do Stage 1 only with CUDAPm1 and then perform Stage 2 in P95? Just thinking since I have quite a bit of RAM available, this could allow me to focus on some deeper runs of P-1
CUDAPm1 gives an ETA per stage only, yes. See the Polite directive in CUDAPm1.ini for system responsiveness.

On prime95 I mix PRP and P-1. It seems to like one P-1 per cpu package better, although that could be system dependent. P-1 likes lots of memory.

Re starting with stage 1 on CUDAPm1 and finishing with stage 2 on prime95 for the same exponent, I think that would require some software development, if it's possible at all. See https://www.mersenneforum.org/showpo...3&postcount=24
kriesel is online now   Reply With Quote
Old 2019-11-15, 14:32   #740
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

22·5·11·13 Posts
Default

I will put this here hoping someone will see it...

On occasion, CUDAPm1 will stop running and exit after completing stage 1. This behavior could suggest that it does not detect enough RAM onboard my GPU for stage 2. It is a GTX 1080 with 8 GB of RAM. Below is how I ran it, by command line:

Code:
cudapm1 98181383 -b1 710000 -b2 13490000
I got these bounds from James Heinrich's "P-1 Probability Calculator" page on https://www.mersenne.ca.

I am running it again with Prime95. Is it possible this B2 is too large to run on my GPU?
storm5510 is offline   Reply With Quote
Old 2019-11-15, 18:17   #741
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by storm5510 View Post
I will put this here hoping someone will see it...

On occasion, CUDAPm1 will stop running and exit after completing stage 1. This behavior could suggest that it does not detect enough RAM onboard my GPU for stage 2. It is a GTX 1080 with 8 GB of RAM. Below is how I ran it, by command line:

Code:
cudapm1 98181383 -b1 710000 -b2 13490000
I got these bounds from James Heinrich's "P-1 Probability Calculator" page on https://www.mersenne.ca.

I am running it again with Prime95. Is it possible this B2 is too large to run on my GPU?
Did it produce a message stating it completed the stage 1 gcd? If so, you could report the stage one result.

Normally I would expect that exponent to run to completion on that model gpu easily. (Or even ~triple that exponent.) But odd things happen sometimes in CUDAPm1. Sometimes a resumption after a stop will do the job. I've hit spots (on the exponent number line) that one gpu can't finish, that if I move it to another gpu of the same or larger gpu ram size, the second can finish. Even in one case, the same model but a different gpu BIOS rev. Quadro 2000's hit issues around 85.5 and 171M exponent. (That model is not recommended for P-1 because the feasible or program-selected bounds are too low.)

There's another thing that happens sometimes that the author described. An excessive roundoff error in stage 2 will produce a very quiet exit, no error message to say why.
For more info, see https://www.mersenneforum.org/showpo...65&postcount=7 Limits
https://www.mersenneforum.org/showpo...73&postcount=9 Limit and run time detail by gpu model
https://www.mersenneforum.org/showpo...37&postcount=3 Errors in general
https://www.mersenneforum.org/showpo...82&postcount=7 P-1 progress
http://www.mersenneforum.org/showpos...34&postcount=3 CUDAPm1 bug and wish list

The most recent versions of Gpuowl can also run on NVIDIA and do P-1 including save files. There may be bugs in it that prevent it from completing P-1 on some exponents, but I haven't found them yet, in admittedly much less sampling of exponents on gpuowl than on CUDAPm1. On large-gpu-ram models like the GTX1080, gpuowl appears capable of going to higher exponents.
See https://www.mersenneforum.org/showpo...5&postcount=17
Run times would be about 30% longer on a GTX1080 than the GTX1080Ti
Zip file for Windows gpuowl v6.11-9 at https://www.mersenneforum.org/showpo...postcount=1403

In any event, have fun!

Last fiddled with by kriesel on 2019-11-15 at 18:22
kriesel is online now   Reply With Quote
Old 2019-11-15, 22:48   #742
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

22·5·11·13 Posts
Default

Quote:
Originally Posted by kriesel View Post
Did it produce a message stating it completed the stage 1 gcd? If so, you could report the stage one result.

Normally I would expect that exponent to run to completion on that model gpu easily. (Or even ~triple that exponent.) But odd things happen sometimes in CUDAPm1. Sometimes a resumption after a stop will do the job. I've hit spots (on the exponent number line) that one gpu can't finish, that if I move it to another gpu of the same or larger gpu ram size, the second can finish. Even in one case, the same model but a different gpu BIOS rev. Quadro 2000's hit issues around 85.5 and 171M exponent. (That model is not recommended for P-1 because the feasible or program-selected bounds are too low.)

There's another thing that happens sometimes that the author described. An excessive roundoff error in stage 2 will produce a very quiet exit, no error message to say why.
For more info, see https://www.mersenneforum.org/showpo...65&postcount=7 Limits
https://www.mersenneforum.org/showpo...73&postcount=9 Limit and run time detail by gpu model
https://www.mersenneforum.org/showpo...37&postcount=3 Errors in general
https://www.mersenneforum.org/showpo...82&postcount=7 P-1 progress
http://www.mersenneforum.org/showpos...34&postcount=3 CUDAPm1 bug and wish list

The most recent versions of Gpuowl can also run on NVIDIA and do P-1 including save files. There may be bugs in it that prevent it from completing P-1 on some exponents, but I haven't found them yet, in admittedly much less sampling of exponents on gpuowl than on CUDAPm1. On large-gpu-ram models like the GTX1080, gpuowl appears capable of going to higher exponents.
See https://www.mersenneforum.org/showpo...5&postcount=17
Run times would be about 30% longer on a GTX1080 than the GTX1080Ti
Zip file for Windows gpuowl v6.11-9 at https://www.mersenneforum.org/showpo...postcount=1403

In any event, have fun!
Thank you for the reply!

The roundoff error never exceeded 0.035. I believe this is what is displayed on every line as "err." Also, it completed the GCD before it dropped out. There was not a results file.

Windows 10, which I am running, has a very detailed Task Manager. It displays information about a GPU, if one is present. At the time, I didn't think the GPU's RAM usage was excessive. It is possible I was looking at stage 1 though.

gpuOwl. I have seen posts about it going back a while. I was under the impression it was a Linux only project. I will give it a try. It uses OpenCL. GPU-Z says I have this capability but I have never ran anything which uses it.

Microsoft has continually updated Windows 10. What I have is v1903 plus some maintenance updates. After a point, the older version of CUDAPm1 and CUDALucas simply would not start. I replaced them with newer ones I found in James' archive on mersenne.org. The newer ones did not seem to have any problems, until today.

It will take me a while to digest everything in your links. I appreciate the effort.

Edit

I tried gpuOwl. It gives me all the below and then exits. It would help to see a config file and worktodo example.


Code:
2019-11-15 18:01:36 gpuowl v6.11-9-g9ae3189
2019-11-15 18:01:36 Note: no config.txt file found
2019-11-15 18:01:36 config: -pm1 98181383 
2019-11-15 18:01:36 98181383 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.02 bits/word
2019-11-15 18:01:37 OpenCL args "-DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-15 18:01:37 OpenCL compilation error -11 (args -DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-11-15 18:01:37 <kernel>:197:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[2]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:197:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
  X2_mul_t4(u[1], u[3]);
  ^
<kernel>:180:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
<kernel>:181:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[1]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
  X2(u[2], u[3]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:266:3: error: invalid output constraint '=v' in a2019-11-15 18:01:37 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:229 build
2019-11-15 18:01:37 Bye

Last fiddled with by storm5510 on 2019-11-15 at 23:09 Reason: Additional
storm5510 is offline   Reply With Quote
Old 2019-11-15, 23:34   #743
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Thank you for the reply!

The roundoff error never exceeded 0.035. I believe this is what is displayed on every line as "err." Also, it completed the GCD before it dropped out. There was not a results file.

Windows 10, which I am running, has a very detailed Task Manager. It displays information about a GPU, if one is present. At the time, I didn't think the GPU's RAM usage was excessive. It is possible I was looking at stage 1 though.

gpuOwl. I have seen posts about it going back a while. I was under the impression it was a Linux only project. I will give it a try. It uses OpenCL. GPU-Z says I have this capability but I have never ran anything which uses it.

Microsoft has continually updated Windows 10. What I have is v1903 plus some maintenance updates. After a point, the older version of CUDAPm1 and CUDALucas simply would not start. I replaced them with newer ones I found in James' archive on mersenne.org. The newer ones did not seem to have any problems, until today.

It will take me a while to digest everything in your links. I appreciate the effort.

Edit

I tried gpuOwl. It gives me all the below and then exits. It would help to see a config file and worktodo example.


Code:
2019-11-15 18:01:36 gpuowl v6.11-9-g9ae3189
2019-11-15 18:01:36 Note: no config.txt file found
2019-11-15 18:01:36 config: -pm1 98181383 
2019-11-15 18:01:36 98181383 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.02 bits/word
2019-11-15 18:01:37 OpenCL args "-DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-15 18:01:37 OpenCL compilation error -11 (args -DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-11-15 18:01:37 <kernel>:197:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[2]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:197:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
  X2_mul_t4(u[1], u[3]);
  ^
<kernel>:180:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
<kernel>:181:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[1]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
  X2(u[2], u[3]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:266:3: error: invalid output constraint '=v' in a2019-11-15 18:01:37 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:229 build
2019-11-15 18:01:37 Bye
Use -use ORIG_X2 either in the command line or config.txt - honestly that should be the default when running under windows...
kracker is offline   Reply With Quote
Old 2019-11-16, 00:10   #744
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

22×5×11×13 Posts
Default

Quote:
Originally Posted by kracker View Post
Use -use ORIG_X2 either in the command line or config.txt - honestly that should be the default when running under windows...
There is no example configuration text in the archive, so I do not know how to format it. The same applies to a worktodo file.

Edit: Please disregard. With some experimentation, I have it running.

Last fiddled with by storm5510 on 2019-11-16 at 00:50
storm5510 is offline   Reply With Quote
Old 2019-11-16, 16:04   #745
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Thank you for the reply!

The roundoff error never exceeded 0.035. I believe this is what is displayed on every line as "err." Also, it completed the GCD before it dropped out. There was not a results file.

Windows 10, which I am running, has a very detailed Task Manager. It displays information about a GPU, if one is present. At the time, I didn't think the GPU's RAM usage was excessive. It is possible I was looking at stage 1 though.

gpuOwl. I have seen posts about it going back a while. I was under the impression it was a Linux only project. I will give it a try. It uses OpenCL. GPU-Z says I have this capability but I have never ran anything which uses it.

Microsoft has continually updated Windows 10. What I have is v1903 plus some maintenance updates. After a point, the older version of CUDAPm1 and CUDALucas simply would not start. I replaced them with newer ones I found in James' archive on mersenne.org. The newer ones did not seem to have any problems, until today.

It will take me a while to digest everything in your links. I appreciate the effort.

Edit

I tried gpuOwl. It gives me all the below and then exits. It would help to see a config file and worktodo example.


Code:
2019-11-15 18:01:36 gpuowl v6.11-9-g9ae3189
2019-11-15 18:01:36 Note: no config.txt file found
2019-11-15 18:01:36 config: -pm1 98181383 
2019-11-15 18:01:36 98181383 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.02 bits/word
2019-11-15 18:01:37 OpenCL args "-DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-11-15 18:01:37 OpenCL compilation error -11 (args -DEXP=98181383u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xf.bbe27b81b7e38p-3 -DIWEIGHT_STEP=0x8.22a2337ec7b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-11-15 18:01:37 <kernel>:197:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[2]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:197:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
  X2_mul_t4(u[1], u[3]);
  ^
<kernel>:180:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (t.x) : "v" (b.x), "v" (t.x)); \
                                           ^
<kernel>:198:3: error: invalid output constraint '=v' in asm
<kernel>:181:37: note: expanded from macro 'X2_mul_t4'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
  X2(u[0], u[1]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:199:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
  X2(u[2], u[3]);
  ^
<kernel>:174:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.x) : "v" (t.x), "v" (b.x)); \
                                           ^
<kernel>:200:3: error: invalid output constraint '=v' in asm
<kernel>:175:37: note: expanded from macro 'X2'
        __asm( "v_add_f64 %0, %1, -%2\n" : "=v" (b.y) : "v" (t.y), "v" (b.y)); \
                                           ^
<kernel>:266:3: error: invalid output constraint '=v' in a2019-11-15 18:01:37 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:229 build
2019-11-15 18:01:37 Bye
In my experience CUDAPm1 can silently exit in stage 2 without ever displaying an excessive roundoff error. But that does not mean one didn't occur, only that a message showing it didn't occur. Sometimes specifying a higher fft length than it selects for itself and running again will get it through the bad spot.
For the gpuowl issue you had, try using -use ORIG_X2 or -use FMA_X2 in the gpuowl command line. It's defaulting to INLINE_X2 and not handling it well. ORIG or FMA may be faster; try and see.
GpuOwl has always been developed on linux. However, going back to the very earliest versions, it has also been ported to Windows. Mostly kracker and I have posted Windows compiled versions. (Hope I haven't slighted anyone there.)
The bigger change was when it also was made to work on NVIDIA even though Preda generally only owns AMD gpus for development testing. I even tested it on Intel IGPs a couple times.
kriesel is online now   Reply With Quote
Old 2019-11-16, 16:34   #746
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

22×5×11×13 Posts
Default

Quote:
Originally Posted by kriesel View Post
In my experience CUDAPm1 can silently exit in stage 2 without ever displaying an excessive roundoff error. But that does not mean one didn't occur, only that a message showing it didn't occur. Sometimes specifying a higher fft length than it selects for itself and running again will get it through the bad spot.

For the gpuowl issue you had, try using -use ORIG_X2 or -use FMA_X2 in the gpuowl command line. It's defaulting to INLINE_X2 and not handling it well. ORIG or FMA may be faster; try and see.

GpuOwl has always been developed on linux. However, going back to the very earliest versions, it has also been ported to Windows. Mostly kracker and I have posted Windows compiled versions. (Hope I haven't slighted anyone there.)
The bigger change was when it also was made to work on NVIDIA even though Preda generally only owns AMD gpus for development testing. I even tested it on Intel IGPs a couple times.
I did not know it was possible to specify a higher FFT for CUDAPm1, and I do not know how. The newer release, from this year, I have been running has been extremely reliable except for what I wrote of above.

As for gpuOwl, I did not have the entire package, just an update. I found the rest in James Heinrich's archive on mersenne.ca. I recall writing this in another topic. No matter. Despite having to do some experimentation, it runs quite well now. I successfully guessed a config.txt layout. It still refuses my worktodo.txt files though. Incorrect syntax, I believe it says.
storm5510 is offline   Reply With Quote
Old 2019-11-16, 17:58   #747
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by storm5510 View Post
I did not know it was possible to specify a higher FFT for CUDAPm1, and I do not know how.
...
As for gpuOwl... It still refuses my worktodo.txt files though. Incorrect syntax, I believe it says.
specify fft length on the command line, with -f option. For example,
cudapm1 -f 4608k

When in doubt, use -h to see the options. And note, despite what that says, -r does nothing.

For worktodo syntax for any of the common GIMPS applications, see https://www.mersenneforum.org/showpo...8&postcount=22
kriesel is online now   Reply With Quote
Old 2019-11-16, 23:27   #748
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

22·5·11·13 Posts
Default

Quote:
Originally Posted by kriesel View Post
specify fft length on the command line, with -f option. For example,
cudapm1 -f 4608k

When in doubt, use -h to see the options. And note, despite what that says, -r does nothing.

For worktodo syntax for any of the common GIMPS applications, see https://www.mersenneforum.org/showpo...8&postcount=22
Thank you!

I was not sure about the command line format as I have never ran it this way before. The "-h" command displayed a lot of other things, but not much about the basics. However, I have it running.

For what I was running, it decided on 5760 for an FFT size. I believe this was what it was using before when it quietly stopped. The command line insisted on multiples of 1024, so I set it to 6144 Now, it is wait and see.

I hope you do not mind: I copied all the text from your link above and saved it here. I have been looking for something like this for years!

storm5510 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3628 2023-04-17 22:08
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 15:23.


Fri Jul 7 15:23:03 UTC 2023 up 323 days, 12:51, 0 users, load averages: 1.17, 1.12, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔