mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

storm5510 2018-05-27 23:57

[QUOTE=kriesel;488460]See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL][/QUOTE]

I've ran it on Windows 10 x64 v1709 and those that came before. No issues with any. Now, MS is pushing 1803 at everyone. I had a couple of unrelated applications that would no longer function after the update. This, I have not tried, but will.

storm5510 2018-05-28 00:18

Here is a Windows 10 x64 v1803 Benchmark:

[QUOTE]Device GeForce GTX 1080
Compatibility 6.1
clockRate (MHz) 1835
memClockRate (MHz) 5005

fft max exp ms/iter
1 22133 0.0208
2 43633 0.0279
4 85933 0.0427
32 657719 0.0448
36 738083 0.0618
64 1296011 0.0674
72 1454273 0.0805
80 1612249 0.0871
96 1927129 0.1100
100 2005673 0.1170
108 2162543 0.1219
112 2240863 0.1229
128 2553659 0.1298
144 2865601 0.1413
160 3176779 0.1476
162 3215629 0.1908
200 3951977 0.2008
208 4106587 0.2418
216 4261051 0.2467
225 4434721 0.2572
256 5031737 0.2673
288 5646379 0.3193
320 6259537 0.3488
324 6336103 0.3624
392 7634537 0.4049
400 7786967 0.4346
432 8395997 0.4580
448 8700169 0.4709
512 9914521 0.5011
576 11125619 0.6026
648 12484649 0.6723
686 13200581 0.7345
800 15343429 0.7485
864 16543493 0.8335
1024 19535569 0.9290
1080 20580341 1.1098
1120 21325891 1.1732
1125 21419011 1.1940
1152 21921901 1.2013
1176 22368691 1.2195
1296 24599717 1.2244
1372 26010389 1.4071
1568 29640913 1.4139
1600 30232693 1.4501
1728 32597297 1.5729
1792 33778141 1.7635
2048 38492887 1.7638
2160 40551479 2.1364
2304 43194913 2.1590
2592 48471289 2.3442
2700 50446621 2.7283
2744 51250889 2.7442
3136 58404433 2.7904
3200 59570449 3.1828
3240 60298969 3.2006
3584 66556463 3.2837
4096 75846319 3.5004
4608 85111207 4.2431
5184 95507747 4.7036
5292 97454309 5.3005
5600 103000823 5.4040
5832 107174381 5.6629
6048 111056879 5.8718
6144 112781477 5.9137
6272 115080019 5.9963
6400 117377567 6.2128
6480 118813021 6.4584
6912 126558077 6.5339
7168 131142761 6.6882
7200 131715607 6.9528
8192 149447533 7.1364
9216 167703023 8.5473
9408 171120919 9.5003
9600 174537299 9.6540
9604 174608443 9.9670
9720 176671801 10.1752
9800 178094491 10.2449
10080 183071879 10.4060
10240 185914837 10.4187
10368 188188471 10.9104
11200 202952693 10.9833
11664 211176269 11.5556
12096 218826341 11.9058
12544 226753511 12.3132
12800 231280639 12.6450
12960 234109067 13.1032
13824 249369863 13.3780
14336 258403573 13.6714
14400 259532291 14.0879
16384 294471259 15.1884[/QUOTE]

kriesel 2018-06-05 03:16

Reference Material
 
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL]
CUDAPm1 P-1 factoring with CUDA on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23389[/URL]
Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.

kriesel 2018-06-23 22:58

P-1 stage 2 residues not reproducing
 
CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48)
M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 60:43:07
Starting stage 1 gcd.
M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA)
Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05)
Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03)
Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15)
Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11)
Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51)
[/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc.
[CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060:
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16)
Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48)
Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09)
[/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common.

9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels.
13-16 nrp residues and roundoffs differ also.
17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model.
Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas?

ET_ 2018-06-24 16:17

[QUOTE=kriesel;490384]CUDAPm1 gives 64-bit residues in stage 1 and stage 2. I thought they would reproduce. So look at this. First run, start to finish on a GTX1060, gave in part,
[CODE]Iteration 4050000 M425000083, 0x45bcabd2d9a7a6f7, n = 24192K, CUDAPm1 v0.20 err = 0.26563 (44:19 real, 53.1666 ms/iter, ETA 42:48)
M425000083, 0x03b1ecbe222d57ae, n = 24192K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 60:43:07
Starting stage 1 gcd.
M425000083 Stage 1 found no factor (P-1, B1=2840000, B2=34080000, e=0, n=24192K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 202, err = 0.25000, (5.29 real, 26.1963 ms/tran, ETA NA)
Transforms: 53864 M425000083, 0x240cabc495e881a9, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (25:05 real, 27.9513 ms/tran, ETA 50:04:27)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 233, err = 0.21250, (6.80 real, 29.1806 ms/tran, ETA 50:06:05)
Transforms: 54016 M425000083, 0x4113fb6410f7f0d9, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9686 ms/tran, ETA 49:41:43)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20703, (6.93 real, 29.4773 ms/tran, ETA 49:42:03)
Transforms: 54058 M425000083, 0x1f056d902f5168a7, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:12 real, 27.9701 ms/tran, ETA 49:17:12)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.20703, (7.19 real, 29.3381 ms/tran, ETA 49:17:15)
Transforms: 54030 M425000083, 0x8bec2d947e1fb288, n = 24192K, CUDAPm1 v0.20 err = 0.24219 (25:11 real, 27.9701 ms/tran, ETA 48:52:07)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20703, (7.33 real, 29.1833 ms/tran, ETA 48:52:11)
Transforms: 54092 M425000083, 0x896d2f455b59709a, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (25:13 real, 27.9710 ms/tran, ETA 48:26:51)
[/CODE]After it completed, for testing purposes, I copied an early stage two interim save file from the 1060 savefile folder, renamed it to checkfile type name, and found I also needed to have a stage one file there too or it would start over from scratch. Put them in the work folder for a gtx1050Ti run and made a corresponding worktodo entry. On the gtx1050Ti I got this; residues don't match, in stage 2, for the same nrp groups; 9-12 on 1050ti doesn't match 9-12 on the 1060, etc.
[CODE]on gtx1050ti, from an early stage 2 save file from a GTX1060:
Starting stage 2.
Using b1 = 2840000, b2 = 34080000, d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 245, err = 0.21624, (10.63 real, 43.3778 ms/tran, ETA 58:01:16)
Transforms: 54030 M425000083, 0xb1a4e401a42c9b89, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:33 real, 42.8108 ms/tran, ETA 61:49:54)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 251, err = 0.20313, (10.92 real, 43.5090 ms/tran, ETA 61:50:48)
Transforms: 54092 M425000083, 0xf7400d0435b23338, n = 24192K, CUDAPm1 v0.20 err = 0.23438 (38:35 real, 42.8107 ms/tran, ETA 63:52:09)
[/CODE]Exponent, b1, b2, d, e, nrp, zeros, ones, pairs, all the same. Run all through stage 1, gcd, and 1-8 nrp of stage 2 in common.

9-12 nrp residues and roundoffs differ, between the gtx1060 and gtx1050Ti. Roundoffs are close and at acceptable levels.
13-16 nrp residues and roundoffs differ also.
17-20 nrp residues and roundoffs differ also. Different roundoffs if differences are minor don't concern me. Differing residues do. The runs are both CUDAPm1 V0.20 64-bit CUDA 5.5 for Windows; different host systems, same OS version, same model system, different gpu model.
Maybe I got the wrong stage one file, not quite finished, and that threw it off somehow? Ideas?[/QUOTE]

IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti.

kriesel 2018-06-24 17:37

[QUOTE=ET_;490434]IIRC, CUDAPm1 used the available memory and the type of GPU to define the optimal magic numbers for P-1 (b1, b2, d, e, brp). I didn't look at the code, but I guesstimate that the stage2 residue to start from was not compatible, or not correctly reshaped for the GTX 1050Ti.[/QUOTE]
I had thought that it would be safe to go from a small-memory gpu to a new larger-memory gpu; more than adequate room to run the bounds, d, e, nrp combination that fit on the more restricted memory gpu. A gpu with more memory would, from a fresh start, probably select bigger bounds to take advantage of the roomier memory on the second gpu, but I think that's an optimization, not a requirement.

The other way around, trying to run something started on a more-memory gpu, transplanted to a less-memory gpu, is likely to fail in stage 2 or perhaps even in stage 1 due to what you describe. The author said so in [url]http://www.mersenneforum.org/showpost.php?p=359086&postcount=421[/url] I've also found cases where start to finish on one gpu, the program selects bounds for stage 2 that have no hope of running to successful completion, requiring gigabytes more memory than is available on the gpu on which those bounds get selected by the program. But neither of those correspond to the case I posted about here.

Looking at read_checkpoint_packed, and other routines, to write a script to export CUDAPm1 savefiles to neutral exchange format, I did not see anything other than these parameters (nothing explicit about how many ROPs or shaders or whatever the gpu had or must have, nor how much memory).
The residue is a word stream, a pretty simple shape. I had the impression the entire save file is 4-byte unsigned integers. (times in seconds). That checked out with the total savefile size, to the byte as I recall.

[CODE]# fread (x_packed, 1, sizeof (unsigned) * (end + 25) , fPtr)
# x_packed[end] = q;
# x_packed[end + 1] = 0; // n
# x_packed[end + 2] = 1; // iteration number
# x_packed[end + 3] = 0; // stage
# x_packed[end + 4] = 0; // accumulated time
# x_packed[end + 5] = 0; // b1
# // 6-9 reserved for extending b1
# // 10-24 reserved for stage 2
# x_packed[end + 10] = b2;
# x_packed[end + 11] = d;
# x_packed[end + 12] = e;
# x_packed[end + 13] = nrp;
# x_packed[end + 14] = 0; // m = number of relative primes already finished
# x_packed[end + 15] = 0; // k = how far done with current crop of relative primes
# x_packed[end + 16] = 0; // t = where to find next relative prime in the bit array
# x_packed[end + 17] = 0; // extra initialization transforms from starting in the middle of a pass
# x_packed[end + 18] = itran_done;
# x_packed[end + 19] = ptran_done + num_tran;
# x_packed[end + 20] = itime;
# x_packed[end + 21] = ptime;
#22-24?[/CODE]The words 0 to end-1 are x_packed. The rest is scalars which the export program I created claims are as follows. Note, these might be from an earlier file than the one I used.
[CODE]Format Mersenne Neutral Exchange d0.4
FileOrigin "CUDAPm1export for Windows" "V0.1 2018-06-23" c425000083s2. 2018 Jun 23 20:52:21 UTC
Type P-1 stage 2
Exponent 425000083
Iteration 4098308
N 24772608
AccumulatedTime 216019
B1 2840000
Reserved6 0
Reserved7 0
Reserved8 0
Reserved9 0
B2 34080000
D 2310
E 2
NRP 4
M 8
K 1229
T 8
Midpasstransforms 0
Itran_done 435
PtrandonePlusNumtran 107880
Itime 12
Ptime 3016
Reserved22 0
Reserved23 0
Reserved24 0
DataFormat binary bytes
CRC32 0x07291d0b
DataBinaryByteCount 53125012
EndOfHeader[/CODE]I see nothing gpu-specific there; no rops or shaders counts, not even choices of thread counts for the 3 phases of the computation.

kriesel 2018-06-25 05:42

prime95 P-1 bug since fixed. Is it present in CUDAPm1?
 
[URL]http://www.mersenneforum.org/showthread.php?t=22776[/URL] shows an issue with prime95 P-1 stage 1 computations, since fixed. Looking at old prime95 source code shows it was present at least back to prime95 v28.5 source, dated 2014, & perhaps earlier, though the code in v27.7, dated 2012, is different. This does not rule out it being present in prime95 P-1 at the time CUDAPm1 was developed, in 2013 (February to November). Since CUDAPm1 development relied on reference to prime95's code and followed it, and CUDAPm1 development and maintenance ended well before the issue was found and fixed in prime95, the issue might also be present in the currently available versions of CUDAPm1.

kriesel 2018-06-27 13:35

B2 reported may not match B2 used
 
In CUDAPm1 v0.20, if a run is continued on a gpu with more memory than it was started on, new bounds are calculated and then the program indicates it will continue with the bounds in the save file. After the run is completed, the result record contains the B2 found from the selection calculation, not the value from the save file that the program indicates was used. Example log excerpts follow.


Using threads: norm1 512, mult 256, norm2 128.
Stage 2 checkpoint found.
Using up to 3780M GPU memory.
Selected B1=3100000, B2=[B]62000000[/B], 3.18% chance of finding a factor
Using B1 = 2840000 from savefile.
Continuing stage 2 from a partial result of M425000083 fft length = 24192K
Starting stage 2.
[B]Using b1 = 2840000, b2 = 34080000[/B], d = 2310, e = 2, nrp = 4
Zeros: 1632727, Ones: 1613513, Pairs: 274647
Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 235, err = 0.20313, (9.42 real, 40.0848 ms/tran, ETA 49:43:53)
Transforms: 54058 M425000083, 0xed32e096fa463f09, n = 24192K, CUDAPm1 v0.20 err = 0.25439 (38:33 real, 42.7948 ms/tran, ETA 57:59:19)
...

Processing 477 - 480 of 480 relative primes.
Inititalizing pass... done. transforms: 299, err = 0.21094, (12.89 real, 43.1271 ms/tran, ETA 39:01)
Transforms: 53916 M425000083, 0x7efe91810f60cfa3, n = 24192K, CUDAPm1 v0.20 err = 0.25000 (38:28 real, 42.8098 ms/tran, ETA 0:46)

Stage 2 complete, 6506485 transforms, estimated total time = 76:55:59
Starting stage 2 gcd.
M425000083 Stage 2 found no factor (P-1, B1=2840000, B2=[B]62000000[/B], e=2, n=24192K CUDAPm1 v0.20)

kriesel 2018-07-05 16:24

new to me GPU, new CUDAPm1 behavior seen
 
Based on what 2 GB Quadro 4000 and 3GB GTX 1060 can run, I thought a 2.5GB Quadro 5000 (which is CC 2.0) would be able to run exponents up to 300M, perhaps higher, in CUDAPm1 v0.20 x64 CUDA 5.5 20130923 version also. It passed a memory test and correctly found the factor for M50001781.

But it failed to run stage 2 on [CODE]M87771547, 0xf6c7342f2bab37fa, n = 5040K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 3:35:59
Starting stage 1 gcd.
M87771547 Stage 1 found no factor (P-1, B1=755000, B2=17365000, e=0, n=5040K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 755000, b2 = 17365000, d = 2310, e = 2, nrp = 48
Zeros: 785147, Ones: 880453, Pairs: 172236
Processing 1 - 48 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03
[/CODE]With repeated restarts, this was repeatably quitting after a few seconds of stage 2 with no reason given.

Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 16:20:44
Starting stage 1 gcd.
M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16
Zeros: 1515937, Ones: 1585823, Pairs: 290684
Processing 1 - 16 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03[/CODE]Again, repeated restarts produce "Quitting" after a few seconds. I'm trying a few other exponents. But for now, CUDAPm1 on this model GPU appears incapable of running stage 2 P-1 at exponents of current or future interest (p>88M), for some unknown reason.

The test exponent 50001781 ran on threads: norm1 512, mult 256, norm2 128.
and fft length 2688k, which don't appear in the fft file or threads file.

Applicable threads entries are; (88M)
5040 64 64 32 11.5743

and (200M)
11250 128 64 1024 26.5149

Retry with 5040 128 64 32 in the threads file per [URL="http://www.mersenneforum.org/showpost.php?p=359096&postcount=424"]http://www.mersenneforum.org/showpost.php?p=359096&postcount=424,[/URL]
on M88, it progresses.

Any ideas what to do to get M200M running stage 2 successfully?

Are there any CUDA55 or higher executables available with the 20131118 or later code fixes, for Windows?

kriesel 2018-07-19 02:13

new behavior: 16 stage 2 residue values taking turns
 
anomalous Quadro 5000 m350000071 cudapm1 V0.20 20130923 CUDA 5.5 on Windows, interim stage 2 residues:

After a normal looking stage 1, the 120 residues output in stage 2 at NRP=4 are repetitive, over a very limited subset of 16 values,
listed below by ascending value, and that look suspicious by their regularity. (I'm used to runs with pseudorandom looking stage 1 and stage 2 residues. This exponent/gpu combination had seemingly well behaved stage 1 residues but peculiarities throughout stage 2.
[CODE]
_____8___4___2___1 difference appearing in the respective bit positions
0xfff7fffbfffdfffe
0xfff7fffbfffdffff

0xfff7fffbfffffffe
0xfff7fffbffffffff

0xfff7fffffffdfffe
0xfff7fffffffdffff

0xfff7fffffffffffe
0xfff7ffffffffffff


0xfffffffbfffdfffe
0xfffffffbfffdffff

0xfffffffbfffffffe
0xfffffffbffffffff

0xfffffffffffdfffe
0xfffffffffffdffff

0xfffffffffffffffe
0xffffffffffffffff[/CODE]End of stage 1 and beginning of stage 2 looked normal. Stage 2 was using 1863MB of 2.5GB on the gpu. At stage 2 wrapup/gcd, it dropped to 746MB.
[CODE]
Iteration 3650000 M350000071, 0xfa26579b34919a34, n = 20412K, CUDAPm1 v0.20 err = 0.12109 (20:01 real, 48.0195 ms/iter, ETA 22:37)
Iteration 3675000 M350000071, 0x3ca8420d52bd5a27, n = 20412K, CUDAPm1 v0.20 err = 0.11719 (20:01 real, 48.0155 ms/iter, ETA 2:37)
M350000071, 0x509e08b93355b407, n = 20412K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 49:05:07
Starting stage 1 gcd.
M350000071 Stage 1 found no factor (P-1, B1=2550000, B2=31875000, e=0, n=20412K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 2550000, b2 = 31875000, d = 2310, e = 2, nrp = 4
Zeros: 1527348, Ones: 1520172, Pairs: 260423
Processing 1 - 4 of 480 relative primes.
Inititalizing pass... done. transforms: 198, err = 0.11328, (4.77 real, 24.0679 ms/tran, ETA NA)
Transforms: 50660 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.11328 (21:53 real, 25.9248 ms/tran, ETA 43:39:45)

Processing 5 - 8 of 480 relative primes.
Inititalizing pass... done. transforms: 229, err = 0.10547, (5.98 real, 26.1210 ms/tran, ETA 43:42:27)
Transforms: 50812 M350000071, 0xfff7fffbfffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9243 ms/tran, ETA 43:19:29)

Processing 9 - 12 of 480 relative primes.
Inititalizing pass... done. transforms: 231, err = 0.10547, (5.99 real, 25.9324 ms/tran, ETA 43:20:31)
Transforms: 50810 M350000071, 0xfff7fffffffdffff, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:57:55)

Processing 13 - 16 of 480 relative primes.
Inititalizing pass... done. transforms: 241, err = 0.10547, (6.24 real, 25.8988 ms/tran, ETA 42:58:31)
Transforms: 50762 M350000071, 0xfff7fffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:56 real, 25.9241 ms/tran, ETA 42:35:58)

Processing 17 - 20 of 480 relative primes.
Inititalizing pass... done. transforms: 247, err = 0.10547, (6.40 real, 25.9017 ms/tran, ETA 42:36:30)
Transforms: 50814 M350000071, 0xfffffffbfffffffe, n = 20412K, CUDAPm1 v0.20 err = 0.10547 (21:57 real, 25.9239 ms/tran, ETA 42:14:22)
[/CODE]Etc. It concluded with a result line no factor found.

kriesel 2018-07-19 02:17

[QUOTE=kriesel;491191]
Same thing occurs on [CODE]M200000491, 0x8ef21dc89a0b7d8c, n = 11250K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 16:20:44
Starting stage 1 gcd.
M200000491 Stage 1 found no factor (P-1, B1=1540000, B2=32340000, e=0, n=11250K CUDAPm1 v0.20)
Starting stage 2.
Using b1 = 1540000, b2 = 32340000, d = 2310, e = 2, nrp = 16
Zeros: 1515937, Ones: 1585823, Pairs: 290684
Processing 1 - 16 of 480 relative primes.
Inititalizing pass... )

Quitting, estimated time spent = 0:03[/CODE] (200M)
11250 128 64 1024 26.5149

Any ideas what to do to get M200M running stage 2 successfully?
[/QUOTE]
Doubling norm1 for the 11250k fft length worked for the 200M exponent


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.