mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-04-23, 16:42   #705
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Question Is this where I post my errors?

Code:
CUDAPm1 v0.20
------- DEVICE 0 -------
name                GeForce GTX 770
Compatibility       3.0
clockRate (MHz)     1202
memClockRate (MHz)  3505
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         524288
sharedMemPerBlock   zu
regsPerBlock        65536
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 8
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 3961M of 4096M GPU memory free.
Index 88
No GeForce GTX 770 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDALucas -cufftbench 9216 9216 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 256, norm2 128.
Using up to 4536M GPU memory.
WARNING:  There may not be enough GPU memory for stage 2!
Selected B1=1515000, B2=45071250, 5.11% chance of finding a factor
Starting stage 1 P-1, M150000713, B1 = 1515000, B2 = 45071250, fft length = 9216
K
Doing 2186688 iterations
Iteration 400000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
0.02441 (1:51:01 real, 16.6531 ms/iter, ETA 8:15:53)
Iteration 800000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
0.02441 (1:51:05 real, 16.6628 ms/iter, ETA 6:25:06)
Iteration 1200000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02588 (1:50:59 real, 16.6478 ms/iter, ETA 4:33:46)
Iteration 1600000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02588 (1:51:04 real, 16.6604 ms/iter, ETA 2:42:54)
Iteration 2000000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02539 (1:51:01 real, 16.6536 ms/iter, ETA 51:49)
M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:06:59
Starting stage 1 gcd.
M150000713 Stage 1 found no factor (P-1, B1=1515000, B2=45071250, e=2, n=9216K C
UDAPm1 v0.20)
Starting stage 2.
Using b1 = 1515000, b2 = 45071250, d = 4620, e = 2, nrp = 51
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3356)
: cudaSafeCall() Runtime API error 2: out of memory.

CUDA reports 3949M of 4096M GPU memory free.
Index 96
No GeForce GTX 770 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDALucas -cufftbench 11200 11200 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 256, norm2 128.
Using up to 4637M GPU memory.
WARNING:  There may not be enough GPU memory for stage 2!
Selected B1=2075000, B2=68993750, 5.91% chance of finding a factor
Starting stage 1 P-1, M200001187, B1 = 2075000, B2 = 68993750, fft length = 1120
0K
Doing 2994040 iterations
Iteration 400000 M200001187, 0x****************, n = 11200K, CUDAPm1 v0.20 err =
 0.23438 (2:23:49 real, 21.5717 ms/iter, ETA 15:32:37)
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130)
: cudaSafeCall() Runtime API error 30: unknown error.
I had no other major apps running at the time. Admittedly I have only 3GB of system ram vs 4GB of GPU ram on a GTX 770. The 2nd crash only happened the moment I opened a single InCognito Chrome tab with no other chrome windows or tabs open at all; and only navigated to this mersenneforum attempting to post about the first crash. The PC was not actively doing anything else or running any other major program actively.
GhettoChild is offline   Reply With Quote
Old 2019-04-23, 19:46   #706
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

544010 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
Code:
CUDAPm1 v0.20
------- DEVICE 0 -------
name                GeForce GTX 770
Compatibility       3.0
clockRate (MHz)     1202
memClockRate (MHz)  3505
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         524288
sharedMemPerBlock   zu
regsPerBlock        65536
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 8
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 3961M of 4096M GPU memory free.
Index 88
No GeForce GTX 770 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDALucas -cufftbench 9216 9216 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 256, norm2 128.
Using up to 4536M GPU memory.
WARNING:  There may not be enough GPU memory for stage 2!
Selected B1=1515000, B2=45071250, 5.11% chance of finding a factor
Starting stage 1 P-1, M150000713, B1 = 1515000, B2 = 45071250, fft length = 9216
K
Doing 2186688 iterations
Iteration 400000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
0.02441 (1:51:01 real, 16.6531 ms/iter, ETA 8:15:53)
Iteration 800000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
0.02441 (1:51:05 real, 16.6628 ms/iter, ETA 6:25:06)
Iteration 1200000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02588 (1:50:59 real, 16.6478 ms/iter, ETA 4:33:46)
Iteration 1600000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02588 (1:51:04 real, 16.6604 ms/iter, ETA 2:42:54)
Iteration 2000000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err =
 0.02539 (1:51:01 real, 16.6536 ms/iter, ETA 51:49)
M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 10:06:59
Starting stage 1 gcd.
M150000713 Stage 1 found no factor (P-1, B1=1515000, B2=45071250, e=2, n=9216K C
UDAPm1 v0.20)
Starting stage 2.
Using b1 = 1515000, b2 = 45071250, d = 4620, e = 2, nrp = 51
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3356)
: cudaSafeCall() Runtime API error 2: out of memory.

CUDA reports 3949M of 4096M GPU memory free.
Index 96
No GeForce GTX 770 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDALucas -cufftbench 11200 11200 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 256, norm2 128.
Using up to 4637M GPU memory.
WARNING:  There may not be enough GPU memory for stage 2!
Selected B1=2075000, B2=68993750, 5.91% chance of finding a factor
Starting stage 1 P-1, M200001187, B1 = 2075000, B2 = 68993750, fft length = 1120
0K
Doing 2994040 iterations
Iteration 400000 M200001187, 0x****************, n = 11200K, CUDAPm1 v0.20 err =
 0.23438 (2:23:49 real, 21.5717 ms/iter, ETA 15:32:37)
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130)
: cudaSafeCall() Runtime API error 30: unknown error.
I had no other major apps running at the time. Admittedly I have only 3GB of system ram vs 4GB of GPU ram on a GTX 770. The 2nd crash only happened the moment I opened a single InCognito Chrome tab with no other chrome windows or tabs open at all; and only navigated to this mersenneforum attempting to post about the first crash. The PC was not actively doing anything else or running any other major program actively.
Re the question in your post title, yes.
It's not necessary to mask P-1 interim residues. And masking them might conceal symptoms, like known-bad or repeating or cycling residues.
Take the following quoted line of your output very seriously. CUDAPM1 v0.20 is known to run for hours or days, uselessly producing unchanging stage 2 interim residues, in such a case. I think the memory crunch is a bit more severe in v0.22 although that contains some bug fixes, so you could give that a try. You could try dialing back on exponent to perhaps fit in your small system ram. I have no CUDAPm1 experience with 3GB system ram or GPU ram larger than system ram.

Quote:
WARNING: There may not be enough GPU memory for stage 2!
See also CUDAPm1 issues 46 and 71 in the attachment at http://www.mersenneforum.org/showpost.php?p=488534&postcount=3
Runtime API error 30 is typically the NVIDIA driver timeout and recovery issue in Windows. See CUDALucas issue 1 in the attachment at http://www.mersenneforum.org/showpost.php?p=488524&postcount=3

For a possible way of recovering, see batch wrapper files and DEVCON http://www.mersenneforum.org/showpos...3&postcount=10
Good luck.

Last fiddled with by kriesel on 2019-04-23 at 19:49
kriesel is online now   Reply With Quote
Old 2019-04-24, 03:41   #707
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Default

off-topic from my errors, how do I specify an FFT size to use per test in the worktodo file? I know in command line you just put "-f FFT_LENGTHk" . I have not seen anyone specify it in the worktodo file; it would allow more automated scripting.

Thank you for all your help.

Last fiddled with by GhettoChild on 2019-04-24 at 03:50
GhettoChild is offline   Reply With Quote
Old 2019-04-24, 03:59   #708
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·5·17 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
off-topic from my errors, how do I specify an FFT size to use per test in the worktodo file? I know in command line you just put "-f FFT_LENGTHk" . I have not seen anyone specify it in the worktodo file; it would allow more automated scripting.

Thank you for all your help.
Read the ini file's comments.
I usually don't bother to specify, just let the program pick, and then it can adjust according to excess roundoff error. If you specify a length, it will halt instead of adjusting fft length to get around the error.
kriesel is online now   Reply With Quote
Old 2019-04-24, 10:34   #709
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7·292 Posts
Default

Is that on a 32-bit system by any chance? I can't see any other reason someone would have only 3GB of RAM these days.
henryzz is online now   Reply With Quote
Old 2019-04-24, 13:01   #710
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

4110 Posts
Talking

Quote:
Originally Posted by kriesel View Post
Read the ini file's comments.
I usually don't bother to specify, just let the program pick, and then it can adjust according to excess roundoff error. If you specify a length, it will halt instead of adjusting fft length to get around the error.
Last time I tried doing what is listed in the ini file, the program stated the line is unreadable and it skipped the line. The ini file instructions might only work for CUDALucas? Also I don't know where in the worktodo line the FFT needs to be specified. I mean the position of the variable might effect if the program accepts it. There are 7 variables per line.

@henryzz:
It's 64-bit; I put that on everything the CPU permits except my tablet since that breaks license & driver support. I just can't afford more ram. It's a DDR2 PC. RAM that old in Montreal, QC, Canada costs a fortune. The entire PC is a collection of donated parts. I was shocked to learn it costs $15-$20CAD just for a 2" PCI-e 6-pin to 8-pin adaptor here. Another problem, UPS batteries don't exist in stores here; but that's a whole other rant unrelated to this forum.

Got this error just now the moment I clicked post in the quick reply box. The display went black for a second or two aswell. Just posting for referrence, I can live with it if the issue is just not enough PC/GPU ram.

Code:
CUDA reports 3961M of 4096M GPU memory free.
Index 101
No GeForce GTX 770 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDALucas -cufftbench 14112 14112 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 256, norm2 128.
Using up to 4851M GPU memory.
WARNING:  There may not be enough GPU memory for stage 2!
Selected B1=2565000, B2=90416250, 6.5% chance of finding a factor
Starting stage 1 P-1, M249500501, B1 = 2565000, B2 = 90416250, fft length = 1411
2K
Doing 3699899 iterations
Iteration 400000 M249500501, 0xf4e102b03fc12715, n = 14112K, CUDAPm1 v0.20 err =
 0.25293 (3:01:37 real, 27.2433 ms/iter, ETA 24:58:20)
C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130)
: cudaSafeCall() Runtime API error 30: unknown error.

Last fiddled with by GhettoChild on 2019-04-24 at 13:46
GhettoChild is offline   Reply With Quote
Old 2019-05-09, 03:30   #711
hansl
 
hansl's Avatar
 
Apr 2019

5×41 Posts
Default

Is it possible that CudaPm1 could support finding Fermat factors? I am wondering if it would be useful for fully factoring F12?
hansl is offline   Reply With Quote
Old 2019-05-09, 12:50   #712
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26·151 Posts
Default

It could, but from the amount of the ECM done to F12, you may not expect to find a factor of it by P-1 in the next few thousand years...
LaurV is offline   Reply With Quote
Old 2019-05-11, 22:16   #713
hansl
 
hansl's Avatar
 
Apr 2019

5×41 Posts
Default

Quote:
Originally Posted by LaurV View Post
It could, but from the amount of the ECM done to F12, you may not expect to find a factor of it by P-1 in the next few thousand years...
Ok, the magnitude of these sort of tasks is starting to sink in a bit.

Anyways, I'd still like to try running this program (more for its intended purpose than F12 now).

I tried running the release 0.22 on linux, but I have CUDA 10.1 installed, so it just spits this message out:
Code:
./CUDAPm1-0.22-cuda10-linux: error while loading shared libraries: libcufft.so.10.0: cannot open shared object file: No such file or directory
The cuda install has these files/symlinks under /usr/local/cuda/lib64:
Code:
libcufft.so              
libcufft.so.10
libcufft.so.10.1.105
Would it be safe/reliable to create symlinks "libcufft.so.10.0" to the actual 10.1 file?

Assuming 10.0 installs have a similar symlink for 10.0 -> 10, maybe the next release could be improved to support more minor versions by looking for just "xxx.10", with no minor version suffix?

Or am I better off just attempting a fresh build of my own?
hansl is offline   Reply With Quote
Old 2019-06-06, 16:27   #714
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

544010 Posts
Default ambitious fft length limit

CUDAPm1 v0.20 has its threshold for the 21952k fft length set a bit too high.
Code:
Device              GeForce GTX 1060 3GB
Compatibility       6.1
clockRate (MHz)     1771
memClockRate (MHz)  4004

   fft    max exp  ms/iter
...

21952  392070229  47.6967
23040  411074273  47.8943
Attempts to run M392000107 quickly ran into excessive roundoff issues. Forcing it to the next higher fft length, which has very little speed penalty in this case, seems to address it.
Code:
Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2572M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 21952K
Doing 4269810 iterations
Iteration = 5600, err = 0.41016 >= 0.40, quitting.
Estimated time spent so far: 0:00

Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2744M GPU memory.
Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K
Doing 4435766 iterations
Iteration = 1400, err = 0.47754 >= 0.40, quitting.
Estimated time spent so far: 0:00

Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2744M GPU memory.
Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K
Doing 4435766 iterations
Iteration = 1400, err = 0.47754 >= 0.40, quitting.
Estimated time spent so far: 0:00

Using threads: norm1 256, mult 128, norm2 128.
Using up to 2700M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 23040K
Doing 4269810 iterations
    SIGINT caught, writing checkpoint.
Estimated time spent so far: 12:29

CUDAPm1 v0.20
------- DEVICE 0 -------
name                GeForce GTX 1060 3GB
Compatibility       6.1
clockRate (MHz)     1771
memClockRate (MHz)  4004
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         1572864
sharedMemPerBlock   zu
regsPerBlock        65536
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 9
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 2927M of 3072M GPU memory free.
Using threads: norm1 256, mult 128, norm2 128.
Using up to 2700M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Using B1 = 2960000 from savefile.
Continuing stage 1 from a partial result of M392000107 fft length = 23040K, iteration = 15601
kriesel is online now   Reply With Quote
Old 2019-06-07, 01:50   #715
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

154016 Posts
Default v0.22 by comparison

Code:
batch wrapper reports Starting cudaPm1-0.22-cuda8.exe on GeForceGTX10603GB at Thu 06/06/2019 17:58:01.61 
CUDAPm1 v0.22
------- DEVICE 0 -------
name                GeForce GTX 1060 3GB
Compatibility       6.1
clockRate (MHz)     1771
memClockRate (MHz)  4004
totalGlobalMem      3221225472
totalConstMem       65536
l2CacheSize         1572864
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 9
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    512
deviceOverlap       1

CUDA reports 2927M of 3072M GPU memory free.
Using threads: norm1 512, mult 32, norm2 32.
Using up to 2700M GPU memory.
Selected B1=3395000, B2=42437500, 3.6% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 3395000, B2 = 42437500, fft length = 23040K
Doing 4898441 iterations
Iteration = 100, err = 0.49584 >= 0.40, quitting.
Estimated time spent so far: 0:00

batch wrapper reports exiting at Thu 06/06/2019 17:59:00.04
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3498 2021-08-06 21:07
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51
World's dumbest CUDA program? xilman Programming 1 2009-11-16 10:26
Factoring program need help Citrix Lone Mersenne Hunters 8 2005-09-16 02:31
Factoring program ET_ Programming 3 2003-11-25 02:57

All times are UTC. The time now is 23:18.


Fri Aug 6 23:18:54 UTC 2021 up 14 days, 17:47, 1 user, load averages: 4.02, 4.05, 4.03

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.