![]() |
|
|
#650 | |
|
Feb 2012
the Netherlands
1110102 Posts |
I have some issues getting Stage 2 going with the 0.22 version. It starts filling the GPU memory all the way to 9200 mb, then juist quits (CMD window closes). I'm using a GTX 1080 Ti with 11GB memory. (Windows 10 Home x64, driver 411.70)
CMD output: Quote:
|
|
|
|
|
|
|
#651 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
65358 Posts |
If you're running it by double-clicking the exe then any message it may give when it terminates would be unfortunately lost. If you open a command prompt first and then run the program, any final error message output (if any) would remain visible.
|
|
|
|
|
|
#652 |
|
Feb 2012
the Netherlands
2×29 Posts |
Tried that, no message what so ever. It just terminates.
|
|
|
|
|
|
#653 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
That's not unusual for CUDAPm1 v0.20, even with console redirection to a file. As I recall the original author owftheevil posted about certain error cases terminating with no message. In my notes, post 373 2013-09-23 win64 cuda5.5 version attached, discussion of fftbench parameters & threadbench.
"excessive stage 2 round-off errors simply halt the program without error messages." "there could be some inefficient fft lengths that I haven't looked at yet, which will cause a test to terminate with an excessive round-off error." https://www.mersenneforum.org/showpo...&postcount=373 The memory filling to 9.2GB on a mere 90m exponent is news. On Quadro 2000,V0.20, I had issues completing exponents at 85m on one unit and not another. Also at 171m. Last fiddled with by kriesel on 2018-11-20 at 08:26 |
|
|
|
|
|
#654 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
That exponent 89326001 has no P-1 assignment listed and is not available for assignment. https://www.mersenne.org/report_expo...exp_hi=&full=1 I could try it here for confirmation and maybe isolation of what environment(s) it occurs in. What was the worktodo entry for it? I suspect it was something like PFactor=1,2,89326001,-1,76,2 |
|
|
|
|
|
|
#655 | ||
|
Feb 2012
the Netherlands
2×29 Posts |
Quote:
Worktodo does indeed look like this: Quote:
Last fiddled with by Stef42 on 2018-11-20 at 19:32 |
||
|
|
|
|
|
#656 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
Code:
CUDA reports 10988M of 11264M GPU memory free. Index 55 Using threads: norm1 32, mult 32, norm2 32. Using up to 4374M GPU memory. Selected B1=770000, B2=18672500, 3.37% chance of finding a factor Starting stage 1 P-1, M89326001, B1 = 770000, B2 = 18672500, fft length = 5184K ... M89326001 Stage 2 found no factor (P-1, B1=770000, B2=18672500, e=4, n=5184K CUDAPm1 v0.20) |
|
|
|
|
|
|
#657 | |
|
Jan 2011
Dudley, MA, USA
73 Posts |
Quote:
Noticing that you have a device with 11GiB, I'm very curious to find out if there was another reason for this limitation that I hadn't been able to determine. Especially since you mention it "starts filling the GPU memory", which it's trying to malloc, and failing. If you could please do me a favour and fiddle with the UnusedMem value in the .ini file, and see if you can determine a value that doesn't crash. I would start with a value something like 7168, as that would simulate the old 4GiB limitation. (11GiB - 4GiB = 7GiB * 1024 = 7168) Last fiddled with by aaronhaviland on 2018-11-21 at 03:25 |
|
|
|
|
|
|
#658 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152B16 Posts |
Interesting benchmarking, followed by a silent halt.
it was an attempt to continue a run that had a silent halt in v0.20. V0.22 did too. Code:
CUDAPm1 v0.22 Warning: Couldn't find or parse ini file option UnusedMem; using default 100MiB. ------- DEVICE 0 ------- name GeForce GTX 1080 Ti Compatibility 6.1 clockRate (MHz) 1620 memClockRate (MHz) 5505 totalGlobalMem 11811160064 totalConstMem 65536 l2CacheSize 2883584 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 28 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment 512 deviceOverlap 1 No GeForceGTX1080Ti_fft.txt file found. Using default fft lengths. For optimal fft selection, please run ./CUDAPm1 -cufftbench 1 8192 r for some small r, 0 < r < 6 e.g. CUDA reports 10988M of 11264M GPU memory free. No GeForceGTX1080Ti_threads.txt file found. Running benchmark. CUDA bench, testing various thread sizes for fft 23040K, doing 15 passes. fft size = 23040K, square time = 0.0000 msec, threads 32 fft size = 23040K, square time = 0.0000 msec, threads 64 fft size = 23040K, square time = 1.4538 msec, threads 128 fft size = 23040K, square time = 1.4513 msec, threads 256 fft size = 23040K, square time = 1.4494 msec, threads 512 fft size = 23040K, square time = 1.4492 msec, threads 1024 Best square time for fft = 23040K, time: 0.0000, t = 64 fft size = 23040K, ave time = 0.1932 msec, Norm1 threads 32, Norm2 threads 32 fft size = 23040K, ave time = 0.2154 msec, Norm1 threads 32, Norm2 threads 64 fft size = 23040K, ave time = 0.2240 msec, Norm1 threads 32, Norm2 threads 128 fft size = 23040K, ave time = 0.2248 msec, Norm1 threads 32, Norm2 threads 256 fft size = 23040K, ave time = 0.2358 msec, Norm1 threads 32, Norm2 threads 512 fft size = 23040K, ave time = 0.2438 msec, Norm1 threads 32, Norm2 threads 1024 fft size = 23040K, ave time = 0.1219 msec, Norm1 threads 64, Norm2 threads 32 fft size = 23040K, ave time = 0.1329 msec, Norm1 threads 64, Norm2 threads 64 fft size = 23040K, ave time = 0.1421 msec, Norm1 threads 64, Norm2 threads 128 fft size = 23040K, ave time = 0.1421 msec, Norm1 threads 64, Norm2 threads 256 fft size = 23040K, ave time = 0.1437 msec, Norm1 threads 64, Norm2 threads 512 fft size = 23040K, ave time = 0.1453 msec, Norm1 threads 64, Norm2 threads 1024 fft size = 23040K, ave time = 0.0589 msec, Norm1 threads 128, Norm2 threads 32 fft size = 23040K, ave time = 0.0648 msec, Norm1 threads 128, Norm2 threads 64 fft size = 23040K, ave time = 0.0693 msec, Norm1 threads 128, Norm2 threads 128 fft size = 23040K, ave time = 0.0687 msec, Norm1 threads 128, Norm2 threads 256 fft size = 23040K, ave time = 0.0689 msec, Norm1 threads 128, Norm2 threads 512 fft size = 23040K, ave time = 0.0684 msec, Norm1 threads 128, Norm2 threads 1024 fft size = 23040K, ave time = 1.7076 msec, Norm1 threads 256, Norm2 threads 32 fft size = 23040K, ave time = 1.7102 msec, Norm1 threads 256, Norm2 threads 64 fft size = 23040K, ave time = 1.7152 msec, Norm1 threads 256, Norm2 threads 128 fft size = 23040K, ave time = 1.7102 msec, Norm1 threads 256, Norm2 threads 256 fft size = 23040K, ave time = 1.7119 msec, Norm1 threads 256, Norm2 threads 512 fft size = 23040K, ave time = 1.7096 msec, Norm1 threads 256, Norm2 threads 1024 fft size = 23040K, ave time = 1.6909 msec, Norm1 threads 512, Norm2 threads 32 fft size = 23040K, ave time = 1.6939 msec, Norm1 threads 512, Norm2 threads 64 fft size = 23040K, ave time = 1.6924 msec, Norm1 threads 512, Norm2 threads 128 fft size = 23040K, ave time = 1.6930 msec, Norm1 threads 512, Norm2 threads 256 fft size = 23040K, ave time = 1.6909 msec, Norm1 threads 512, Norm2 threads 512 fft size = 23040K, ave time = 1.6869 msec, Norm1 threads 512, Norm2 threads 1024 Average time for fft= 23040K, all threads variations 0.7659 msec, threshold value for valid timings set to 0.7500 of this, 0.5744 msec Warning, time for fft = 23040K, time: 0.1932 msec, t1 = 32, t2 = 64, t3 = 32 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.2154 msec, t1 = 32, t2 = 64, t3 = 64 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.2240 msec, t1 = 32, t2 = 64, t3 = 128 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.2248 msec, t1 = 32, t2 = 64, t3 = 256 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.2358 msec, t1 = 32, t2 = 64, t3 = 512 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.2438 msec, t1 = 32, t2 = 64, t3 = 1024 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1219 msec, t1 = 64, t2 = 64, t3 = 32 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1329 msec, t1 = 64, t2 = 64, t3 = 64 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1421 msec, t1 = 64, t2 = 64, t3 = 128 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1421 msec, t1 = 64, t2 = 64, t3 = 256 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1437 msec, t1 = 64, t2 = 64, t3 = 512 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.1453 msec, t1 = 64, t2 = 64, t3 = 1024 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0589 msec, t1 = 128, t2 = 64, t3 = 32 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0648 msec, t1 = 128, t2 = 64, t3 = 64 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0693 msec, t1 = 128, t2 = 64, t3 = 128 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0687 msec, t1 = 128, t2 = 64, t3 = 256 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0689 msec, t1 = 128, t2 = 64, t3 = 512 is below threshold 0.5744 msec (0.7500 of average 0.7659) Warning, time for fft = 23040K, time: 0.0684 msec, t1 = 128, t2 = 64, t3 = 1024 is below threshold 0.5744 msec (0.7500 of average 0.7659) Timings below threshold were detected for 18 norm1 / mult / norm2 combinations for fft length 23040K and omitted from consideration for best. Best time for fft = 23040K, time: 1.6869, t1 = 512, t2 = 64, t3 = 1024 Using threads: norm1 512, mult 128, norm2 128. No stage 2 checkpoint. Using up to 10800M GPU memory. Selected B1=3965000, B2=100116250, 4.25% chance of finding a factor Using B1 = 3310000 from savefile. Continuing stage 2 from a partial result of M400001387 fft length = 23040K Starting stage 2. batch wrapper reports exit at Tue 11/20/2018 22:03:21.82 23040 411074273 16.5434 23040 32 32 32 17.7388 why so different in v0.22? Last fiddled with by kriesel on 2018-11-21 at 05:27 |
|
|
|
|
|
#659 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152B16 Posts |
Quote:
Code:
batch wrapper reports (re)launch at Tue 11/20/2018 22:43:27.36 reset count 0 of max 3 CUDAPm1 v0.22 ------- DEVICE 0 ------- name GeForce GTX 1080 Ti Compatibility 6.1 clockRate (MHz) 1620 memClockRate (MHz) 5505 totalGlobalMem 11811160064 totalConstMem 65536 l2CacheSize 2883584 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 28 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment 512 deviceOverlap 1 CUDA reports 10988M of 11264M GPU memory free. No entry for fft = 5184k found. Running benchmark. CUDA bench, testing various thread sizes for fft 5184K, doing 15 passes. fft size = 5184K, square time = 0.3257 msec, threads 32 fft size = 5184K, square time = 0.3291 msec, threads 64 fft size = 5184K, square time = 0.3289 msec, threads 128 fft size = 5184K, square time = 0.3288 msec, threads 256 fft size = 5184K, square time = 0.3293 msec, threads 512 fft size = 5184K, square time = 0.3300 msec, threads 1024 Best square time for fft = 5184K, time: 0.3257, t = 32 fft size = 5184K, ave time = 0.0443 msec, Norm1 threads 32, Norm2 threads 32 fft size = 5184K, ave time = 0.0534 msec, Norm1 threads 32, Norm2 threads 64 fft size = 5184K, ave time = 0.0524 msec, Norm1 threads 32, Norm2 threads 128 fft size = 5184K, ave time = 0.0525 msec, Norm1 threads 32, Norm2 threads 256 fft size = 5184K, ave time = 0.0522 msec, Norm1 threads 32, Norm2 threads 512 fft size = 5184K, ave time = 0.0526 msec, Norm1 threads 32, Norm2 threads 1024 fft size = 5184K, ave time = 0.4067 msec, Norm1 threads 64, Norm2 threads 32 fft size = 5184K, ave time = 0.4113 msec, Norm1 threads 64, Norm2 threads 64 fft size = 5184K, ave time = 0.4102 msec, Norm1 threads 64, Norm2 threads 128 fft size = 5184K, ave time = 0.4093 msec, Norm1 threads 64, Norm2 threads 256 fft size = 5184K, ave time = 0.4090 msec, Norm1 threads 64, Norm2 threads 512 fft size = 5184K, ave time = 0.4074 msec, Norm1 threads 64, Norm2 threads 1024 fft size = 5184K, ave time = 0.3929 msec, Norm1 threads 128, Norm2 threads 32 fft size = 5184K, ave time = 0.3937 msec, Norm1 threads 128, Norm2 threads 64 fft size = 5184K, ave time = 0.3940 msec, Norm1 threads 128, Norm2 threads 128 fft size = 5184K, ave time = 0.3950 msec, Norm1 threads 128, Norm2 threads 256 fft size = 5184K, ave time = 0.3950 msec, Norm1 threads 128, Norm2 threads 512 fft size = 5184K, ave time = 0.3946 msec, Norm1 threads 128, Norm2 threads 1024 fft size = 5184K, ave time = 0.3882 msec, Norm1 threads 256, Norm2 threads 32 fft size = 5184K, ave time = 0.3883 msec, Norm1 threads 256, Norm2 threads 64 fft size = 5184K, ave time = 0.3884 msec, Norm1 threads 256, Norm2 threads 128 fft size = 5184K, ave time = 0.3877 msec, Norm1 threads 256, Norm2 threads 256 fft size = 5184K, ave time = 0.3869 msec, Norm1 threads 256, Norm2 threads 512 fft size = 5184K, ave time = 0.3877 msec, Norm1 threads 256, Norm2 threads 1024 fft size = 5184K, ave time = 0.3860 msec, Norm1 threads 512, Norm2 threads 32 fft size = 5184K, ave time = 0.3860 msec, Norm1 threads 512, Norm2 threads 64 fft size = 5184K, ave time = 0.3861 msec, Norm1 threads 512, Norm2 threads 128 fft size = 5184K, ave time = 0.3856 msec, Norm1 threads 512, Norm2 threads 256 fft size = 5184K, ave time = 0.3845 msec, Norm1 threads 512, Norm2 threads 512 fft size = 5184K, ave time = 0.3866 msec, Norm1 threads 512, Norm2 threads 1024 Average time for fft= 5184K, all threads variations 0.3256 msec, threshold value for valid timings set to 0.7500 of this, 0.2442 msec Warning, time for fft = 5184K, time: 0.0443 msec, t1 = 32, t2 = 32, t3 = 32 is below threshold 0.2442 msec (0.7500 of average 0.3256) Warning, time for fft = 5184K, time: 0.0534 msec, t1 = 32, t2 = 32, t3 = 64 is below threshold 0.2442 msec (0.7500 of average 0.3256) Warning, time for fft = 5184K, time: 0.0524 msec, t1 = 32, t2 = 32, t3 = 128 is below threshold 0.2442 msec (0.7500 of average 0.3256) Warning, time for fft = 5184K, time: 0.0525 msec, t1 = 32, t2 = 32, t3 = 256 is below threshold 0.2442 msec (0.7500 of average 0.3256) Warning, time for fft = 5184K, time: 0.0522 msec, t1 = 32, t2 = 32, t3 = 512 is below threshold 0.2442 msec (0.7500 of average 0.3256) Warning, time for fft = 5184K, time: 0.0526 msec, t1 = 32, t2 = 32, t3 = 1024 is below threshold 0.2442 msec (0.7500 of average 0.3256) Timings below threshold were detected for 6 norm1 / mult / norm2 combinations for fft length 5184K and omitted from consideration for best. Best time for fft = 5184K, time: 0.3845, t1 = 512, t2 = 32, t3 = 512 Using threads: norm1 512, mult 128, norm2 128. Using up to 10854M GPU memory. Selected B1=630000, B2=10710000, 1.7% chance of finding a factor Starting stage 1 P-1, M89326001, B1 = 630000, B2 = 10710000, fft length = 5184K Doing 908960 iterations Iteration 100000 M89326001, 0xe14f06f8949c9abe, n = 5184K, CUDAPm1 v0.22 err = 0.05005 (5:50 real, 3.5019 ms/iter, ETA 47:12) Iteration 200000 M89326001, 0x2270467c553262ac, n = 5184K, CUDAPm1 v0.22 err = 0.04785 (5:52 real, 3.5179 ms/iter, ETA 41:34) Iteration 300000 M89326001, 0x5a9e1dbc55f055ff, n = 5184K, CUDAPm1 v0.22 err = 0.04785 (5:56 real, 3.5598 ms/iter, ETA 36:07) Iteration 400000 M89326001, 0x08db3e9c13c343d2, n = 5184K, CUDAPm1 v0.22 err = 0.05078 (5:57 real, 3.5742 ms/iter, ETA 30:19) Iteration 500000 M89326001, 0x523ce55fab10ec94, n = 5184K, CUDAPm1 v0.22 err = 0.05078 (5:58 real, 3.5762 ms/iter, ETA 24:22) Iteration 600000 M89326001, 0x54ded79cc40cfee8, n = 5184K, CUDAPm1 v0.22 err = 0.05273 (5:58 real, 3.5774 ms/iter, ETA 18:25) Iteration 700000 M89326001, 0xc99c3d9fc3a34ec0, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (5:57 real, 3.5727 ms/iter, ETA 12:26) Iteration 800000 M89326001, 0x9d20b89d1a9a4877, n = 5184K, CUDAPm1 v0.22 err = 0.05273 (5:56 real, 3.5611 ms/iter, ETA 6:28) Iteration 900000 M89326001, 0xefda9b1094553b12, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (5:56 real, 3.5583 ms/iter, ETA 0:31) M89326001, 0x05d2c8d87dcf4f23, n = 5184K, CUDAPm1 v0.22 Stage 1 complete, estimated total time = 53:52 Starting stage 1 gcd. M89326001 Stage 1 found no factor (P-1, B1=630000, B2=10710000, e=0, n=5184K CUDAPm1 v0.22) Starting stage 2. Using b1 = 630000, b2 = 10710000, d = 2310, e = 12, nrp = 240 Zeros: 475228, Ones: 552452, Pairs: 105088 Processing 1 - 240 of 480 relative primes. Initializing pass... done. transforms: 17421, err = 0.04785, (31.27 real, 1.7951 ms/tran, ETA NA) Transforms: 205710 M89326001, 0x90102bd269087607, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (6:20 real, 1.8476 ms/tran, ETA 31:27) Transforms: 196446 M89326001, 0x266c3a943dd54799, n = 5184K, CUDAPm1 v0.22 err = 0.05273 (6:08 real, 1.8721 ms/tran, ETA 25:33) Transforms: 201980 M89326001, 0x621dda916a4e4cbb, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (6:18 real, 1.8750 ms/tran, ETA 19:21) Processing 241 - 480 of 480 relative primes. Initializing pass... done. transforms: 20111, err = 0.04785, (37.16 real, 1.8476 ms/tran, ETA 18:45) Transforms: 205504 M89326001, 0x750bff764daa4a29, n = 5184K, CUDAPm1 v0.22 err = 0.05078 (6:25 real, 1.8733 ms/tran, ETA 12:23) Transforms: 196422 M89326001, 0x5945c6a5e2e76c0e, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (6:05 real, 1.8588 ms/tran, ETA 6:16) Transforms: 201562 M89326001, 0x0e9d8ad7c2845c56, n = 5184K, CUDAPm1 v0.22 err = 0.04883 (6:14 real, 1.8586 ms/tran, ETA 0:00) Stage 2 complete, 1245156 transforms, estimated total time = 38:39 Starting stage 2 gcd. M89326001 Stage 2 found no factor (P-1, B1=630000, B2=10710000, e=12, n=5184K CUDAPm1 v0.22) batch wrapper reports exit at Wed 11/21/2018 0:26:48.00 |
|
|
|
|
|
|
#660 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001010112 Posts |
Just like V0.20.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |
| World's dumbest CUDA program? | xilman | Programming | 1 | 2009-11-16 10:26 |
| Factoring program need help | Citrix | Lone Mersenne Hunters | 8 | 2005-09-16 02:31 |
| Factoring program | ET_ | Programming | 3 | 2003-11-25 02:57 |