![]() |
Is this where I post my errors?
[CODE]
CUDAPm1 v0.20 ------- DEVICE 0 ------- name GeForce GTX 770 Compatibility 3.0 clockRate (MHz) 1202 memClockRate (MHz) 3505 totalGlobalMem zu totalConstMem zu l2CacheSize 524288 sharedMemPerBlock zu regsPerBlock 65536 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 8 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 3961M of 4096M GPU memory free. Index 88 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 9216 9216 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4536M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=1515000, B2=45071250, 5.11% chance of finding a factor Starting stage 1 P-1, M150000713, B1 = 1515000, B2 = 45071250, fft length = 9216 K Doing 2186688 iterations Iteration 400000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:01 real, 16.6531 ms/iter, ETA 8:15:53) Iteration 800000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:05 real, 16.6628 ms/iter, ETA 6:25:06) Iteration 1200000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:50:59 real, 16.6478 ms/iter, ETA 4:33:46) Iteration 1600000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:51:04 real, 16.6604 ms/iter, ETA 2:42:54) Iteration 2000000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02539 (1:51:01 real, 16.6536 ms/iter, ETA 51:49) M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 10:06:59 Starting stage 1 gcd. M150000713 Stage 1 found no factor (P-1, B1=1515000, B2=45071250, e=2, n=9216K C UDAPm1 v0.20) Starting stage 2. Using b1 = 1515000, b2 = 45071250, d = 4620, e = 2, nrp = 51 C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3356) : cudaSafeCall() Runtime API error 2: out of memory. CUDA reports 3949M of 4096M GPU memory free. Index 96 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 11200 11200 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4637M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=2075000, B2=68993750, 5.91% chance of finding a factor Starting stage 1 P-1, M200001187, B1 = 2075000, B2 = 68993750, fft length = 1120 0K Doing 2994040 iterations Iteration 400000 M200001187, 0x****************, n = 11200K, CUDAPm1 v0.20 err = 0.23438 (2:23:49 real, 21.5717 ms/iter, ETA 15:32:37) C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130) : cudaSafeCall() Runtime API error 30: unknown error. [/CODE] I had no other major apps running at the time. Admittedly I have only 3GB of system ram vs 4GB of GPU ram on a GTX 770. The 2nd crash only happened the moment I opened a single InCognito Chrome tab with no other chrome windows or tabs open at all; and only navigated to this mersenneforum attempting to post about the first crash. The PC was not actively doing anything else or running any other major program actively. |
[QUOTE=GhettoChild;514477][CODE]
CUDAPm1 v0.20 ------- DEVICE 0 ------- name GeForce GTX 770 Compatibility 3.0 clockRate (MHz) 1202 memClockRate (MHz) 3505 totalGlobalMem zu totalConstMem zu l2CacheSize 524288 sharedMemPerBlock zu regsPerBlock 65536 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 8 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 3961M of 4096M GPU memory free. Index 88 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 9216 9216 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4536M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=1515000, B2=45071250, 5.11% chance of finding a factor Starting stage 1 P-1, M150000713, B1 = 1515000, B2 = 45071250, fft length = 9216 K Doing 2186688 iterations Iteration 400000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:01 real, 16.6531 ms/iter, ETA 8:15:53) Iteration 800000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:05 real, 16.6628 ms/iter, ETA 6:25:06) Iteration 1200000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:50:59 real, 16.6478 ms/iter, ETA 4:33:46) Iteration 1600000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:51:04 real, 16.6604 ms/iter, ETA 2:42:54) Iteration 2000000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02539 (1:51:01 real, 16.6536 ms/iter, ETA 51:49) M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 10:06:59 Starting stage 1 gcd. M150000713 Stage 1 found no factor (P-1, B1=1515000, B2=45071250, e=2, n=9216K C UDAPm1 v0.20) Starting stage 2. Using b1 = 1515000, b2 = 45071250, d = 4620, e = 2, nrp = 51 C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3356) : cudaSafeCall() Runtime API error 2: out of memory. CUDA reports 3949M of 4096M GPU memory free. Index 96 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 11200 11200 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4637M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=2075000, B2=68993750, 5.91% chance of finding a factor Starting stage 1 P-1, M200001187, B1 = 2075000, B2 = 68993750, fft length = 1120 0K Doing 2994040 iterations Iteration 400000 M200001187, 0x****************, n = 11200K, CUDAPm1 v0.20 err = 0.23438 (2:23:49 real, 21.5717 ms/iter, ETA 15:32:37) C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130) : cudaSafeCall() Runtime API error 30: unknown error. [/CODE]I had no other major apps running at the time. Admittedly I have only 3GB of system ram vs 4GB of GPU ram on a GTX 770. The 2nd crash only happened the moment I opened a single InCognito Chrome tab with no other chrome windows or tabs open at all; and only navigated to this mersenneforum attempting to post about the first crash. The PC was not actively doing anything else or running any other major program actively.[/QUOTE] Re the question in your post title, yes. It's not necessary to mask P-1 interim residues. And masking them might conceal symptoms, like known-bad or repeating or cycling residues. Take the following quoted line of your output very seriously. CUDAPM1 v0.20 is known to run for hours or days, uselessly producing unchanging stage 2 interim residues, in such a case. I think the memory crunch is a bit more severe in v0.22 although that contains some bug fixes, so you could give that a try. You could try dialing back on exponent to perhaps fit in your small system ram. I have no CUDAPm1 experience with 3GB system ram or GPU ram larger than system ram. [QUOTE]WARNING: There may not be enough GPU memory for stage 2![/QUOTE] See also CUDAPm1 issues 46 and 71 in the attachment at [URL="http://www.mersenneforum.org/showpost.php?p=488534&postcount=3"]http://www.mersenneforum.org/showpost.php?p=4885[B]3[/B]4&postcount=3[/URL] Runtime API error 30 is typically the NVIDIA driver timeout and recovery issue in Windows. See CUDALucas issue 1 in the attachment at [URL="http://www.mersenneforum.org/showpost.php?p=488524&postcount=3"]http://www.mersenneforum.org/showpost.php?p=4885[B]2[/B]4&postcount=3[/URL] For a possible way of recovering, see batch wrapper files and DEVCON [URL]http://www.mersenneforum.org/showpost.php?p=488513&postcount=10[/URL] Good luck. |
off-topic from my errors, how do I specify an FFT size to use per test in the worktodo file? I know in command line you just put "[B]-f[/B] [I]FFT_LENGTH[/I][B]k[/B]" . I have not seen anyone specify it in the worktodo file; it would allow more automated scripting.
Thank you for all your help. |
[QUOTE=GhettoChild;514523]off-topic from my errors, how do I specify an FFT size to use per test in the worktodo file? I know in command line you just put "[B]-f[/B] [I]FFT_LENGTH[/I][B]k[/B]" . I have not seen anyone specify it in the worktodo file; it would allow more automated scripting.
Thank you for all your help.[/QUOTE] Read the ini file's comments. I usually don't bother to specify, just let the program pick, and then it can adjust according to excess roundoff error. If you specify a length, it will halt instead of adjusting fft length to get around the error. |
Is that on a 32-bit system by any chance? I can't see any other reason someone would have only 3GB of RAM these days.
|
[QUOTE=kriesel;514524]Read the ini file's comments.
I usually don't bother to specify, just let the program pick, and then it can adjust according to excess roundoff error. If you specify a length, it will halt instead of adjusting fft length to get around the error.[/QUOTE] Last time I tried doing what is listed in the ini file, the program stated the line is unreadable and it skipped the line. The ini file instructions might only work for CUDALucas? Also I don't know where in the worktodo line the FFT needs to be specified. I mean the position of the variable might effect if the program accepts it. There are 7 variables per line. @henryzz: It's 64-bit; I put that on everything the CPU permits except my tablet since that breaks license & driver support. I just can't afford more ram. It's a DDR2 PC. :rant: RAM that old in Montreal, QC, Canada costs a fortune. The entire PC is a collection of donated parts. I was shocked to learn it costs $15-$20CAD just for a 2" PCI-e 6-pin to 8-pin adaptor here. Another problem, UPS batteries don't exist in stores here; but that's a whole other rant unrelated to this forum. Got this error just now the moment I clicked post in the quick reply box. The display went black for a second or two aswell. Just posting for referrence, I can live with it if the issue is just not enough PC/GPU ram. [CODE] CUDA reports 3961M of 4096M GPU memory free. Index 101 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 14112 14112 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4851M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=2565000, B2=90416250, 6.5% chance of finding a factor Starting stage 1 P-1, M249500501, B1 = 2565000, B2 = 90416250, fft length = 1411 2K Doing 3699899 iterations Iteration 400000 M249500501, 0xf4e102b03fc12715, n = 14112K, CUDAPm1 v0.20 err = 0.25293 (3:01:37 real, 27.2433 ms/iter, ETA 24:58:20) C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130) : cudaSafeCall() Runtime API error 30: unknown error. [/CODE] |
Is it possible that CudaPm1 could support finding Fermat factors? I am wondering if it would be useful for fully factoring F12?
|
It could, but from the amount of the ECM done to F12, you may not expect to find a factor of it by P-1 in the next few thousand years...
|
[QUOTE=LaurV;516221]It could, but from the amount of the ECM done to F12, you may not expect to find a factor of it by P-1 in the next few thousand years...[/QUOTE]
Ok, the magnitude of these sort of tasks is starting to sink in a bit. Anyways, I'd still like to try running this program (more for its intended purpose than F12 now). I tried running the release 0.22 on linux, but I have CUDA 10.1 installed, so it just spits this message out: [code] ./CUDAPm1-0.22-cuda10-linux: error while loading shared libraries: libcufft.so.10.0: cannot open shared object file: No such file or directory [/code] The cuda install has these files/symlinks under /usr/local/cuda/lib64: [code] libcufft.so libcufft.so.10 libcufft.so.10.1.105 [/code] Would it be safe/reliable to create symlinks "libcufft.so.10.0" to the actual 10.1 file? Assuming 10.0 installs have a similar symlink for 10.0 -> 10, maybe the next release could be improved to support more minor versions by looking for just "xxx.10", with no minor version suffix? Or am I better off just attempting a fresh build of my own? |
ambitious fft length limit
CUDAPm1 v0.20 has its threshold for the 21952k fft length set a bit too high.
[CODE]Device GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 fft max exp ms/iter ... 21952 392070229 47.6967 23040 411074273 47.8943[/CODE]Attempts to run M392000107 quickly ran into excessive roundoff issues. Forcing it to the next higher fft length, which has very little speed penalty in this case, seems to address it.[CODE]Using threads: norm1 256, mult 512, norm2 1024. Using up to 2572M GPU memory. Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 21952K Doing 4269810 iterations Iteration = 5600, err = 0.41016 >= 0.40, quitting. Estimated time spent so far: 0:00 Using threads: norm1 256, mult 512, norm2 1024. Using up to 2744M GPU memory. Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K Doing 4435766 iterations Iteration = 1400, err = 0.47754 >= 0.40, quitting. Estimated time spent so far: 0:00 Using threads: norm1 256, mult 512, norm2 1024. Using up to 2744M GPU memory. Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K Doing 4435766 iterations Iteration = 1400, err = 0.47754 >= 0.40, quitting. Estimated time spent so far: 0:00 Using threads: norm1 256, mult 128, norm2 128. Using up to 2700M GPU memory. Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 23040K Doing 4269810 iterations SIGINT caught, writing checkpoint. Estimated time spent so far: 12:29 CUDAPm1 v0.20 ------- DEVICE 0 ------- name GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 totalGlobalMem zu totalConstMem zu l2CacheSize 1572864 sharedMemPerBlock zu regsPerBlock 65536 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 9 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 2927M of 3072M GPU memory free. Using threads: norm1 256, mult 128, norm2 128. Using up to 2700M GPU memory. Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor Using B1 = 2960000 from savefile. Continuing stage 1 from a partial result of M392000107 fft length = 23040K, iteration = 15601[/CODE] |
v0.22 by comparison
[CODE]batch wrapper reports Starting cudaPm1-0.22-cuda8.exe on GeForceGTX10603GB at Thu 06/06/2019 17:58:01.61
CUDAPm1 v0.22 ------- DEVICE 0 ------- name GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 totalGlobalMem 3221225472 totalConstMem 65536 l2CacheSize 1572864 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 9 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment 512 deviceOverlap 1 CUDA reports 2927M of 3072M GPU memory free. Using threads: norm1 512, mult 32, norm2 32. Using up to 2700M GPU memory. Selected B1=3395000, B2=42437500, 3.6% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 3395000, B2 = 42437500, fft length = [B]23040K[/B] Doing 4898441 iterations Iteration = 100, err = 0.49584 >= 0.40, quitting. Estimated time spent so far: 0:00 batch wrapper reports exiting at Thu 06/06/2019 17:59:00.04 [/CODE] |
| All times are UTC. The time now is 23:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.