![]() |
|
|
#705 |
|
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada
41 Posts |
Code:
CUDAPm1 v0.20 ------- DEVICE 0 ------- name GeForce GTX 770 Compatibility 3.0 clockRate (MHz) 1202 memClockRate (MHz) 3505 totalGlobalMem zu totalConstMem zu l2CacheSize 524288 sharedMemPerBlock zu regsPerBlock 65536 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 8 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 3961M of 4096M GPU memory free. Index 88 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 9216 9216 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4536M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=1515000, B2=45071250, 5.11% chance of finding a factor Starting stage 1 P-1, M150000713, B1 = 1515000, B2 = 45071250, fft length = 9216 K Doing 2186688 iterations Iteration 400000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:01 real, 16.6531 ms/iter, ETA 8:15:53) Iteration 800000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02441 (1:51:05 real, 16.6628 ms/iter, ETA 6:25:06) Iteration 1200000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:50:59 real, 16.6478 ms/iter, ETA 4:33:46) Iteration 1600000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02588 (1:51:04 real, 16.6604 ms/iter, ETA 2:42:54) Iteration 2000000 M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 err = 0.02539 (1:51:01 real, 16.6536 ms/iter, ETA 51:49) M150000713, 0x****************, n = 9216K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 10:06:59 Starting stage 1 gcd. M150000713 Stage 1 found no factor (P-1, B1=1515000, B2=45071250, e=2, n=9216K C UDAPm1 v0.20) Starting stage 2. Using b1 = 1515000, b2 = 45071250, d = 4620, e = 2, nrp = 51 C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(3356) : cudaSafeCall() Runtime API error 2: out of memory. CUDA reports 3949M of 4096M GPU memory free. Index 96 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 11200 11200 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4637M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=2075000, B2=68993750, 5.91% chance of finding a factor Starting stage 1 P-1, M200001187, B1 = 2075000, B2 = 68993750, fft length = 1120 0K Doing 2994040 iterations Iteration 400000 M200001187, 0x****************, n = 11200K, CUDAPm1 v0.20 err = 0.23438 (2:23:49 real, 21.5717 ms/iter, ETA 15:32:37) C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130) : cudaSafeCall() Runtime API error 30: unknown error. |
|
|
|
|
|
#706 | ||
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
544010 Posts |
Quote:
It's not necessary to mask P-1 interim residues. And masking them might conceal symptoms, like known-bad or repeating or cycling residues. Take the following quoted line of your output very seriously. CUDAPM1 v0.20 is known to run for hours or days, uselessly producing unchanging stage 2 interim residues, in such a case. I think the memory crunch is a bit more severe in v0.22 although that contains some bug fixes, so you could give that a try. You could try dialing back on exponent to perhaps fit in your small system ram. I have no CUDAPm1 experience with 3GB system ram or GPU ram larger than system ram. Quote:
Runtime API error 30 is typically the NVIDIA driver timeout and recovery issue in Windows. See CUDALucas issue 1 in the attachment at http://www.mersenneforum.org/showpost.php?p=488524&postcount=3 For a possible way of recovering, see batch wrapper files and DEVCON http://www.mersenneforum.org/showpos...3&postcount=10 Good luck. Last fiddled with by kriesel on 2019-04-23 at 19:49 |
||
|
|
|
|
|
#707 |
|
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada
41 Posts |
off-topic from my errors, how do I specify an FFT size to use per test in the worktodo file? I know in command line you just put "-f FFT_LENGTHk" . I have not seen anyone specify it in the worktodo file; it would allow more automated scripting.
Thank you for all your help. Last fiddled with by GhettoChild on 2019-04-24 at 03:50 |
|
|
|
|
|
#708 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
26·5·17 Posts |
Quote:
I usually don't bother to specify, just let the program pick, and then it can adjust according to excess roundoff error. If you specify a length, it will halt instead of adjusting fft length to get around the error. |
|
|
|
|
|
|
#709 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
7·292 Posts |
Is that on a 32-bit system by any chance? I can't see any other reason someone would have only 3GB of RAM these days.
|
|
|
|
|
|
#710 | |
|
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada
4110 Posts |
Quote:
@henryzz: It's 64-bit; I put that on everything the CPU permits except my tablet since that breaks license & driver support. I just can't afford more ram. It's a DDR2 PC. RAM that old in Montreal, QC, Canada costs a fortune. The entire PC is a collection of donated parts. I was shocked to learn it costs $15-$20CAD just for a 2" PCI-e 6-pin to 8-pin adaptor here. Another problem, UPS batteries don't exist in stores here; but that's a whole other rant unrelated to this forum.Got this error just now the moment I clicked post in the quick reply box. The display went black for a second or two aswell. Just posting for referrence, I can live with it if the issue is just not enough PC/GPU ram. Code:
CUDA reports 3961M of 4096M GPU memory free. Index 101 No GeForce GTX 770 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDALucas -cufftbench 14112 14112 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128. Using up to 4851M GPU memory. WARNING: There may not be enough GPU memory for stage 2! Selected B1=2565000, B2=90416250, 6.5% chance of finding a factor Starting stage 1 P-1, M249500501, B1 = 2565000, B2 = 90416250, fft length = 1411 2K Doing 3699899 iterations Iteration 400000 M249500501, 0xf4e102b03fc12715, n = 14112K, CUDAPm1 v0.20 err = 0.25293 (3:01:37 real, 27.2433 ms/iter, ETA 24:58:20) C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1130) : cudaSafeCall() Runtime API error 30: unknown error. Last fiddled with by GhettoChild on 2019-04-24 at 13:46 |
|
|
|
|
|
|
#711 |
|
Apr 2019
5×41 Posts |
Is it possible that CudaPm1 could support finding Fermat factors? I am wondering if it would be useful for fully factoring F12?
|
|
|
|
|
|
#712 |
|
Romulan Interpreter
Jun 2011
Thailand
26·151 Posts |
It could, but from the amount of the ECM done to F12, you may not expect to find a factor of it by P-1 in the next few thousand years...
|
|
|
|
|
|
#713 | |
|
Apr 2019
5×41 Posts |
Quote:
Anyways, I'd still like to try running this program (more for its intended purpose than F12 now). I tried running the release 0.22 on linux, but I have CUDA 10.1 installed, so it just spits this message out: Code:
./CUDAPm1-0.22-cuda10-linux: error while loading shared libraries: libcufft.so.10.0: cannot open shared object file: No such file or directory Code:
libcufft.so libcufft.so.10 libcufft.so.10.1.105 Assuming 10.0 installs have a similar symlink for 10.0 -> 10, maybe the next release could be improved to support more minor versions by looking for just "xxx.10", with no minor version suffix? Or am I better off just attempting a fresh build of my own? |
|
|
|
|
|
|
#714 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
544010 Posts |
CUDAPm1 v0.20 has its threshold for the 21952k fft length set a bit too high.
Code:
Device GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 fft max exp ms/iter ... 21952 392070229 47.6967 23040 411074273 47.8943 Code:
Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2572M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 21952K
Doing 4269810 iterations
Iteration = 5600, err = 0.41016 >= 0.40, quitting.
Estimated time spent so far: 0:00
Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2744M GPU memory.
Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K
Doing 4435766 iterations
Iteration = 1400, err = 0.47754 >= 0.40, quitting.
Estimated time spent so far: 0:00
Using threads: norm1 256, mult 512, norm2 1024.
Using up to 2744M GPU memory.
Selected B1=3075000, B2=55350000, 3.72% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 3075000, B2 = 55350000, fft length = 21952K
Doing 4435766 iterations
Iteration = 1400, err = 0.47754 >= 0.40, quitting.
Estimated time spent so far: 0:00
Using threads: norm1 256, mult 128, norm2 128.
Using up to 2700M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Starting stage 1 P-1, M392000107, B1 = 2960000, B2 = 41440000, fft length = 23040K
Doing 4269810 iterations
SIGINT caught, writing checkpoint.
Estimated time spent so far: 12:29
CUDAPm1 v0.20
------- DEVICE 0 -------
name GeForce GTX 1060 3GB
Compatibility 6.1
clockRate (MHz) 1771
memClockRate (MHz) 4004
totalGlobalMem zu
totalConstMem zu
l2CacheSize 1572864
sharedMemPerBlock zu
regsPerBlock 65536
warpSize 32
memPitch zu
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 9
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment zu
deviceOverlap 1
CUDA reports 2927M of 3072M GPU memory free.
Using threads: norm1 256, mult 128, norm2 128.
Using up to 2700M GPU memory.
Selected B1=2960000, B2=41440000, 3.52% chance of finding a factor
Using B1 = 2960000 from savefile.
Continuing stage 1 from a partial result of M392000107 fft length = 23040K, iteration = 15601
|
|
|
|
|
|
#715 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
154016 Posts |
Code:
batch wrapper reports Starting cudaPm1-0.22-cuda8.exe on GeForceGTX10603GB at Thu 06/06/2019 17:58:01.61 CUDAPm1 v0.22 ------- DEVICE 0 ------- name GeForce GTX 1060 3GB Compatibility 6.1 clockRate (MHz) 1771 memClockRate (MHz) 4004 totalGlobalMem 3221225472 totalConstMem 65536 l2CacheSize 1572864 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 9 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment 512 deviceOverlap 1 CUDA reports 2927M of 3072M GPU memory free. Using threads: norm1 512, mult 32, norm2 32. Using up to 2700M GPU memory. Selected B1=3395000, B2=42437500, 3.6% chance of finding a factor Starting stage 1 P-1, M392000107, B1 = 3395000, B2 = 42437500, fft length = 23040K Doing 4898441 iterations Iteration = 100, err = 0.49584 >= 0.40, quitting. Estimated time spent so far: 0:00 batch wrapper reports exiting at Thu 06/06/2019 17:59:00.04 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3498 | 2021-08-06 21:07 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |
| World's dumbest CUDA program? | xilman | Programming | 1 | 2009-11-16 10:26 |
| Factoring program need help | Citrix | Lone Mersenne Hunters | 8 | 2005-09-16 02:31 |
| Factoring program | ET_ | Programming | 3 | 2003-11-25 02:57 |