![]() |
[QUOTE=kriesel;488460]See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL][/QUOTE]
I've ran it on Windows 10 x64 v1709 and those that came before. No issues with any. Now, MS is pushing 1803 at everyone. I had a couple of unrelated applications that would no longer function after the update. This, I have not tried, but will. |
Here is a Windows 10 x64 v1803 Benchmark:
[QUOTE]Device GeForce GTX 1080 Compatibility 6.1 clockRate (MHz) 1835 memClockRate (MHz) 5005 fft max exp ms/iter 1 22133 0.0208 2 43633 0.0279 4 85933 0.0427 32 657719 0.0448 36 738083 0.0618 64 1296011 0.0674 72 1454273 0.0805 80 1612249 0.0871 96 1927129 0.1100 100 2005673 0.1170 108 2162543 0.1219 112 2240863 0.1229 128 2553659 0.1298 144 2865601 0.1413 160 3176779 0.1476 162 3215629 0.1908 200 3951977 0.2008 208 4106587 0.2418 216 4261051 0.2467 225 4434721 0.2572 256 5031737 0.2673 288 5646379 0.3193 320 6259537 0.3488 324 6336103 0.3624 392 7634537 0.4049 400 7786967 0.4346 432 8395997 0.4580 448 8700169 0.4709 512 9914521 0.5011 576 11125619 0.6026 648 12484649 0.6723 686 13200581 0.7345 800 15343429 0.7485 864 16543493 0.8335 1024 19535569 0.9290 1080 20580341 1.1098 1120 21325891 1.1732 1125 21419011 1.1940 1152 21921901 1.2013 1176 22368691 1.2195 1296 24599717 1.2244 1372 26010389 1.4071 1568 29640913 1.4139 1600 30232693 1.4501 1728 32597297 1.5729 1792 33778141 1.7635 2048 38492887 1.7638 2160 40551479 2.1364 2304 43194913 2.1590 2592 48471289 2.3442 2700 50446621 2.7283 2744 51250889 2.7442 3136 58404433 2.7904 3200 59570449 3.1828 3240 60298969 3.2006 3584 66556463 3.2837 4096 75846319 3.5004 4608 85111207 4.2431 5184 95507747 4.7036 5292 97454309 5.3005 5600 103000823 5.4040 5832 107174381 5.6629 6048 111056879 5.8718 6144 112781477 5.9137 6272 115080019 5.9963 6400 117377567 6.2128 6480 118813021 6.4584 6912 126558077 6.5339 7168 131142761 6.6882 7200 131715607 6.9528 8192 149447533 7.1364 9216 167703023 8.5473 9408 171120919 9.5003 9600 174537299 9.6540 9604 174608443 9.9670 9720 176671801 10.1752 9800 178094491 10.2449 10080 183071879 10.4060 10240 185914837 10.4187 10368 188188471 10.9104 11200 202952693 10.9833 11664 211176269 11.5556 12096 218826341 11.9058 12544 226753511 12.3132 12800 231280639 12.6450 12960 234109067 13.1032 13824 249369863 13.3780 14336 258403573 13.6714 14400 259532291 14.0879 16384 294471259 15.1884[/QUOTE] |
Reference Material
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;) General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL] CUDAPm1 P-1 factoring with CUDA on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23389[/URL] Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there. |
P-1 stage 2 residues not reproducing
CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48) M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 60:43:07 Starting stage 1 gcd. M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20) Starting stage 2. Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4 Zeros: 1632727, Ones: 1613513, Pairs: 274647 Processing 1 - 4 of 480 relative primes. Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA) Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27) Processing 5 - 8 of 480 relative primes. Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05) Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43) Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03) Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12) Processing 13 - 16 of 480 relative primes. Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15) Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07) Processing 17 - 20 of 480 relative primes. Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11) Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51) [/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc. [CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060: Starting stage 2. Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4 Zeros: 1632727, Ones: 1613513, Pairs: 274647 Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53) Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19) Processing 13 - 16 of 480 relative primes. Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16) Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54) Processing 17 - 20 of 480 relative primes. Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48) Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09) [/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common. 9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels. 13-16 nrp residues and roundoffs differ also. 17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model. Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas? |
[QUOTE=kriesel;490384]CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48) M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 60:43:07 Starting stage 1 gcd. M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20) Starting stage 2. Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4 Zeros: 1632727, Ones: 1613513, Pairs: 274647 Processing 1 - 4 of 480 relative primes. Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA) Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27) Processing 5 - 8 of 480 relative primes. Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05) Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43) Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03) Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12) Processing 13 - 16 of 480 relative primes. Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15) Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07) Processing 17 - 20 of 480 relative primes. Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11) Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51) [/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc. [CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060: Starting stage 2. Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4 Zeros: 1632727, Ones: 1613513, Pairs: 274647 Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53) Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19) Processing 13 - 16 of 480 relative primes. Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16) Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54) Processing 17 - 20 of 480 relative primes. Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48) Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09) [/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common. 9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels. 13-16 nrp residues and roundoffs differ also. 17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model. Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas?[/QUOTE] IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti. |
[QUOTE=ET_;490434]IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti.[/QUOTE]
I had thought that it would be safe to go from a small-memory gpu to a new larger-memory gpu; more than adequate room to run the bounds, d, e, nrp combination that fit on the more restricted memory gpu. A gpu with more memory would, from a fresh start, probably select bigger bounds to take advantage of the roomier memory on the second gpu, but I think that's an optimization, not a requirement. The other way around, trying to run something started on a more-memory gpu, transplanted to a less-memory gpu, is likely to fail in stage 2 or perhaps even in stage 1 due to what you describe. The author said so in [url]http://www.mersenneforum.org/showpost.php?p=359086&postcount=421[/url] I've also found cases where start to finish on one gpu, the program selects bounds for stage 2 that have no hope of running to successful completion, requiring gigabytes more memory than is available on the gpu on which those bounds get selected by the program. But neither of those correspond to the case I posted about here. Looking at read_checkpoint_packed, and other routines, to write a script to export CUDAPm1 savefiles to neutral exchange format, I did not see anything other than these parameters (nothing explicit about how many ROPs or shaders or whatever the gpu had or must have, nor how much memory). The residue is a word stream, a pretty simple shape. I had the impression the entire save file is 4-byte unsigned integers. (times in seconds). That checked out with the total savefile size, to the byte as I recall. [CODE]# fread (x_packed, 1, sizeof (unsigned) * (end + 25) , fPtr) # x_packed[end] = q; # x_packed[end + 1] = 0; // n # x_packed[end + 2] = 1; // iteration number # x_packed[end + 3] = 0; // stage # x_packed[end + 4] = 0; // accumulated time # x_packed[end + 5] = 0; // b1 # // 6-9 reserved for extending b1 # // 10-24 reserved for stage 2 # x_packed[end + 10] = b2; # x_packed[end + 11] = d; # x_packed[end + 12] = e; # x_packed[end + 13] = nrp; # x_packed[end + 14] = 0; // m = number of relative primes already finished # x_packed[end + 15] = 0; // k = how far done with current crop of relative primes # x_packed[end + 16] = 0; // t = where to find next relative prime in the bit array # x_packed[end + 17] = 0; // extra initialization transforms from starting in the middle of a pass # x_packed[end + 18] = itran_done; # x_packed[end + 19] = ptran_done + num_tran; # x_packed[end + 20] = itime; # x_packed[end + 21] = ptime; #22-24?[/CODE]The words 0 to end-1 are x_packed. The rest is scalars which the export program I created claims are as follows. Note, these might be from an earlier file than the one I used. [CODE]Format Mersenne Neutral Exchange d0.4 FileOrigin "CUDAPm1export for Windows" "V0.1 2018-06-23" c425000083s2. 2018 Jun 23 20:52:21 UTC Type P-1 stage 2 Exponent 425000083 Iteration 4098308 N 24772608 AccumulatedTime 216019 B1 2840000 Reserved6 0 Reserved7 0 Reserved8 0 Reserved9 0 B2 34080000 D 2310 E 2 NRP 4 M 8 K 1229 T 8 Midpasstransforms 0 Itran_done 435 PtrandonePlusNumtran 107880 Itime 12 Ptime 3016 Reserved22 0 Reserved23 0 Reserved24 0 DataFormat binary bytes CRC32 0x07291d0b DataBinaryByteCount 53125012 EndOfHeader[/CODE]I see nothing gpu-specific there; no rops or shaders counts, not even choices of thread counts for the 3 phases of the computation. |
prime95 P-1 bug since fixed. Is it present in CUDAPm1?
[URL]http://www.mersenneforum.org/showthread.php?t=22776[/URL] shows an issue with prime95 P-1 stage 1 computations, since fixed. Looking at old prime95 source code shows it was present at least back to prime95 v28.5 source, dated 2014, & perhaps earlier, though the code in v27.7, dated 2012, is different. This does not rule out it being present in prime95 P-1 at the time CUDAPm1 was developed, in 2013 (February to November). Since CUDAPm1 development relied on reference to prime95's code and followed it, and CUDAPm1 development and maintenance ended well before the issue was found and fixed in prime95, the issue might also be present in the currently available versions of CUDAPm1.
|
B2 reported may not match B2 used
In CUDAPm1 v0.20, if a run is continued on a gpu with more memory than it was started on, new bounds are calculated and then the program indicates it will continue with the bounds in the save file. After the run is completed, the result record contains the B2 found from the selection calculation, not the value from the save file that the program indicates was used. Example log excerpts follow.
Using threads: norm1 512, mult 256, norm2 128. Stage 2 checkpoint found. Using up to 3780M GPU memory. Selected B1=3100000, B2=[B]62000000[/B], 3.18% chance of finding a factor Using B1 = 2840000 from savefile. Continuing stage 2 from a partial result of M425000083 fft length = 24192K Starting stage 2. [B]Using b1 = 2840000, b2 = 34080000[/B], d = 2310, e = 2, nrp = 4 Zeros: 1632727, Ones: 1613513, Pairs: 274647 Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53) Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19) ... Processing 477 - 480 of 480 relative primes. Inititalizing pass... done. transforms: 299, err = 0.21094, (12.89 real, 43.1271 ms/tran, ETA 39:01) Transforms: 53916 M425000083, 0x7efe91810f60cfa3, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (38:28 real, 42.8098 ms/tran, ETA 0:46) Stage 2 complete, 6506485 transforms, estimated total time = 76:55:59 Starting stage 2 gcd. M425000083 Stage 2 found no factor (P-1, B1=2840000, B2=[B]62000000[/B], e=2, n=24192K CUDAPm1 v0.20) |
new to me GPU, new CUDAPm1 behavior seen
Based on what 2 GB Quadro 4000 and 3GB GTX 1060 can run, I thought a 2.5GB Quadro 5000 (which is CC 2.0) would be able to run exponents up to 300M, perhaps higher, in CUDAPm1 v0.20 x64 CUDA 5.5 20130923 version also. It passed a memory test and correctly found the factor for M50001781.
But it failed to run stage 2 on [CODE]M87771547, 0xf6c7342f2bab37fa, n = 5040K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 3:35:59 Starting stage 1 gcd. M87771547 Stage 1 found no factor (P-1, B1=755000, B2=17365000, e=0, n=5040K CUDAPm1 v0.20) Starting stage 2. Using b1 = 755000, b2 = 17365000, d = 2310, e = 2, nrp = 48 Zeros: 785147, Ones: 880453, Pairs: 172236 Processing 1 - 48 of 480 relative primes. Inititalizing pass... ) Quitting, estimated time spent = 0:03 [/CODE]With repeated restarts, this was repeatably quitting after a few seconds of stage 2 with no reason given. Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 16:20:44 Starting stage 1 gcd. M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20) Starting stage 2. Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16 Zeros: 1515937, Ones: 1585823, Pairs: 290684 Processing 1 - 16 of 480 relative primes. Inititalizing pass... ) Quitting, estimated time spent = 0:03[/CODE]Again, repeated restarts produce "Quitting" after a few seconds. I'm trying a few other exponents. But for now, CUDAPm1 on this model GPU appears incapable of running stage 2 P-1 at exponents of current or future interest (p>88M), for some unknown reason. The test exponent 50001781 ran on threads: norm1 512, mult 256, norm2 128. and fft length 2688k, which don't appear in the fft file or threads file. Applicable threads entries are; (88M) 5040 64 64 32 11.5743 and (200M) 11250 128 64 1024 26.5149 Retry with 5040 128 64 32 in the threads file per [URL="http://www.mersenneforum.org/showpost.php?p=359096&postcount=424"]http://www.mersenneforum.org/showpost.php?p=359096&postcount=424,[/URL] on M88, it progresses. Any ideas what to do to get M200M running stage 2 successfully? Are there any CUDA55 or higher executables available with the 20131118 or later code fixes, for Windows? |
new behavior: 16 stage 2 residue values taking turns
anomalous Quadro 5000 m350000071 cudapm1 V0.20 20130923 CUDA 5.5 on Windows, interim stage 2 residues:
After a normal looking stage 1, the 120 residues output in stage 2 at NRP=4 are repetitive, over a very limited subset of 16 values, listed below by ascending value, and that look suspicious by their regularity. (I'm used to runs with pseudorandom looking stage 1 and stage 2 residues. This exponent/gpu combination had seemingly well behaved stage 1 residues but peculiarities throughout stage 2. [CODE] _____8___4___2___1 difference appearing in the respective bit positions 0xfff7fffbfffdfffe 0xfff7fffbfffdffff 0xfff7fffbfffffffe 0xfff7fffbffffffff 0xfff7fffffffdfffe 0xfff7fffffffdffff 0xfff7fffffffffffe 0xfff7ffffffffffff 0xfffffffbfffdfffe 0xfffffffbfffdffff 0xfffffffbfffffffe 0xfffffffbffffffff 0xfffffffffffdfffe 0xfffffffffffdffff 0xfffffffffffffffe 0xffffffffffffffff[/CODE]End of stage 1 and beginning of stage 2 looked normal. Stage 2 was using 1863MB of 2.5GB on the gpu. At stage 2 wrapup/gcd, it dropped to 746MB. [CODE] Iteration 3650000 M350000071, 0xfa26579b34919a34, n = 20412K, CUDAPm1 v0.20 err = 0.12109 (20:01 real, 48.0195 ms/iter, ETA 22:37) Iteration 3675000 M350000071, 0x3ca8420d52bd5a27, n = 20412K, CUDAPm1 v0.20 err = 0.11719 (20:01 real, 48.0155 ms/iter, ETA 2:37) M350000071, 0x509e08b93355b407, n = 20412K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 49:05:07 Starting stage 1 gcd. M350000071 Stage 1 found no factor (P-1, B1=2550000, B2=31875000, e=0, n=20412K CUDAPm1 v0.20) Starting stage 2. Using b1 = 2550000, b2 = 31875000, d = 2310, e = 2, nrp = 4 Zeros: 1527348, Ones: 1520172, Pairs: 260423 Processing 1 - 4 of 480 relative primes. Inititalizing pass... done. transforms: 198, err = 0.11328, (4.77 real, 24.0679 ms/tran, ETA NA) Transforms: 50660 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.11328 (21:53 real, 25.9248 ms/tran, ETA 43:39:45) Processing 5 - 8 of 480 relative primes. Inititalizing pass... done. transforms: 229, err = 0.10547, (5.98 real, 26.1210 ms/tran, ETA 43:42:27) Transforms: 50812 M350000071, 0xfff7fffbfffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9243 ms/tran, ETA 43:19:29) Processing 9 - 12 of 480 relative primes. Inititalizing pass... done. transforms: 231, err = 0.10547, (5.99 real, 25.9324 ms/tran, ETA 43:20:31) Transforms: 50810 M350000071, 0xfff7fffffffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:57:55) Processing 13 - 16 of 480 relative primes. Inititalizing pass... done. transforms: 241, err = 0.10547, (6.24 real, 25.8988 ms/tran, ETA 42:58:31) Transforms: 50762 M350000071, 0xfff7fffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:56 real, 25.9241 ms/tran, ETA 42:35:58) Processing 17 - 20 of 480 relative primes. Inititalizing pass... done. transforms: 247, err = 0.10547, (6.40 real, 25.9017 ms/tran, ETA 42:36:30) Transforms: 50814 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:14:22) [/CODE]Etc. It concluded with a result line no factor found. |
[QUOTE=kriesel;491191]
Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 16:20:44 Starting stage 1 gcd. M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20) Starting stage 2. Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16 Zeros: 1515937, Ones: 1585823, Pairs: 290684 Processing 1 - 16 of 480 relative primes. Inititalizing pass... ) Quitting, estimated time spent = 0:03[/CODE] (200M) 11250 128 64 1024 26.5149 Any ideas what to do to get M200M running stage 2 successfully? [/QUOTE] Doubling norm1 for the 11250k fft length worked for the 200M exponent |
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.