![]() |
[QUOTE=kriesel;499263]Nonzero (primenet goal) B2 values are being reported as zero.[/QUOTE]
B2=0 in result was a bug, should be fixed now. (just a 'cosmetic' bug, the right B2 was in effect). [QUOTE]Final residues for primes look odd to me, not a PRP3 of a prime. Is this a side effect of the simultaneous P-1 or what?[/QUOTE] Yes. The "classic" PRP was done with Base==3 (thus the name "PRP3"). In PRP-1 with B1!=0, the Base is not 3 anymore, it is 3^(2*E*powerSmooth(B1)). For a prime, residue == base. |
Prime check on gpuowl v3.8
In 11 minutes on an RX480. This ended the run, although there was more work in the worktodo file. I prefer work continue.
[QUOTE]{"exponent":1257787, "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"3.8-91c52fa-OpenCL"}, "timestamp":"2018-11-01 22:18:22 UTC", "user":"kriesel", "computer":"condorella-rx480", "aid":"0", "residue-type":1, "fft-length":"512K", "res64":"0000000000000001", "errors":{"gerbicz":0}} [/QUOTE] V5.0 did not halt, doing one adequately large known prime exponent after another, so that's good. |
gpuowl V4.6-bb691cb
1 Attachment(s)
The only minor change I made was commenting out the following in gpu.cpp, so it would compile. For PRP (no tf or P-1), it runs.
// 10/31/18 kwk static_assert(sizeof(long) == 8, "size long"); I tried to get B1 P-1 to run but V4.6 ignored every related worktodo syntax I could find in previous posts. Early results are V3.3 is slower than V3.5; V3.5 to V3.8 are about equal to each other, and faster than V3.9; V3.9 is usually very slightly faster than V4.6; V4.6 is faster than V5.0. Part of V5.0's lag is switching sooner to larger fft lengths; I selected a list of test exponents to check for GEC issues near the upper limits of fft lengths back at V3.3 and V3.5, according to their approximate limits. It's not clear what the utility of GCD over 0 points is. Maybe just print statements executing regardless of whether P-1 occurs or not. Sample run: [CODE]2018-11-01 14:33:33 condorella-rx480 gpuowl 4.6-bb691cb 2018-11-01 14:33:33 condorella-rx480 FFT 20480K: Width 1024 (256x4), Height 2048 (256x8), Middle 5; 17.69 bits/word 2018-11-01 14:33:34 condorella-rx480 Note: using short carry kernels 2018-11-01 14:33:35 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-11-01 14:33:38 condorella-rx480 OpenCL compilation in 3229 ms, with "-DEXP=371000039u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed- math -cl-std=CL2.0 " 2018-11-01 14:33:41 condorella-rx480 PRP M(371000039), FFT 20480K, 17.69 bits/word, B1 0 2018-11-01 14:33:51 condorella-rx480 OK loaded: 0/371000039, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003) 2018-11-01 14:33:51 condorella-rx480 Selected 0 P-1 trial points 2018-11-01 14:34:16 condorella-rx480 OK 800/371000039 [ 0.00%], 19.23 ms/it [19.23, 19.23]; ETA 82d 13:15; e2ac5ec2a6819689 (check 9.19s) 2018-11-01 14:37:14 condorella-rx480 10000/371000039 [ 0.00%], 19.34 ms/it [19.24, 20.98]; ETA 83d 01:23; edb6fd2abeb3f16c 2018-11-01 14:40:28 condorella-rx480 20000/371000039 [ 0.01%], 19.38 ms/it [19.29, 20.98]; ETA 83d 05:29; 8d715d688ed88463 2018-11-01 14:40:43 condorella-rx480 Stopping, please wait.. 2018-11-01 14:40:52 condorella-rx480 OK 20800/371000039 [ 0.01%], 19.17 ms/it [18.99, 19.34]; ETA 82d 07:13; d6286177c0386521 (check 9.41s) 2018-11-01 14:40:52 condorella-rx480 Starting GCD over 0 points 2018-11-01 14:40:53 condorella-rx480 Waiting for GCD to finish.. 2018-11-01 14:40:53 condorella-rx480 Exiting because "stop requested" 2018-11-01 14:40:53 condorella-rx480 Bye[/CODE] |
gpuowl V5.0-f604bb1 for Win7-x64 AMD opencl 2.0
1 Attachment(s)
This is the result of the Oct 31 commit and does not have the prp3 P fix or B2!=0 fix. It's been a long day getting to this point. Perhaps someone could beat on this code and find an issue or two that have eluded us so far, and report any to Preda or here in this thread.
|
Some benchmarking, V3.x, 4.6, 5.0
Today's benchmarking results on RX480 in Windows for V4.6 and 5.0 and a bit more in 3.x have been added. See the second attachment at [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL]
|
gpuowl V5.0-f604bb1 for Win7-x64 AMD opencl 2.0
1 Attachment(s)
OOps, had too many "manage attachments" windows open and the wrong one posted in post 829.
_This_ attachment hopefully is the result of the Oct 31 V5.0 commit and does not have the prp3 P fix or B2!=0 fix. It's been a long day getting to this point. Perhaps someone could beat on this code and find an issue or two that have eluded us so far, and report any to Preda or here in this thread. |
openowl V4.3-537c681
1 Attachment(s)
Similarly to v4.6, commented out an assert in gpu.cpp and it compiled. The only testing I've done on this was a run with -B1 200 on the command line of m1257787.
[CODE]C:\msys64\home\ken\gpuowl-compile\v4.3>openowl -B1 200 2018-11-02 09:22:12 gpuowl 4.3-537c681 2018-11-02 09:22:12 FFT 512K: Width 512 (64x8), Height 512 (64x8); 2.40 bits/word 2018-11-02 09:22:12 Note: using long carry kernels 2018-11-02 09:22:13 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-11-02 09:22:17 OpenCL compilation in 3861 ms, with "-DEXP=1257787u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-11-02 09:22:17 PRP M(1257787), FFT 512K, 2.40 bits/word, B1 200 2018-11-02 09:22:17 Starting P-1 first-stage GCD 2018-11-02 09:22:18 OK loaded: 0/1257787, B1 200, blockSize 400, 62dc0af77c7b5e45 (expected 62dc0af77c7b5e45) 2018-11-02 09:22:19 GCD says: still no factor 2018-11-02 09:22:19 OK 800/1257787 [ 0.06%], 1.70 ms/it [0.53, 2.87]; ETA 0d 00:36; 49283f488d1734a4 (check 0.23s) 2018-11-02 09:22:24 10000/1257787 [ 0.79%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:11; 8093e635acf1cfb6 2018-11-02 09:22:29 20000/1257787 [ 1.59%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:11; f876ff6c60cfdefb 2018-11-02 09:22:34 30000/1257787 [ 2.38%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:11; 2e84cedbb7da818a 2018-11-02 09:22:39 40000/1257787 [ 3.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 71f45d92aae292bd 2018-11-02 09:22:45 50000/1257787 [ 3.97%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 1fd28086ab6826a0 2018-11-02 09:22:50 60000/1257787 [ 4.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; c2b180c2f9b281b7 2018-11-02 09:22:55 70000/1257787 [ 5.56%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 22138c0bf6a96a9a 2018-11-02 09:23:00 80000/1257787 [ 6.36%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 7ae80137efaa8689 2018-11-02 09:23:05 90000/1257787 [ 7.15%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; e97c39f5383fe107 2018-11-02 09:23:10 100000/1257787 [ 7.95%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; e34f0813e6847496 2018-11-02 09:23:15 110000/1257787 [ 8.74%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 1f145f046a7eaba7 2018-11-02 09:23:21 120000/1257787 [ 9.54%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 056ef16631c69eaa 2018-11-02 09:23:26 130000/1257787 [10.33%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; b819516ee5b773d8 2018-11-02 09:23:31 140000/1257787 [11.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 4ec0e61c87765e83 2018-11-02 09:23:36 150000/1257787 [11.92%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:10; 4afe585b6826d6ea 2018-11-02 09:23:41 OK 160000/1257787 [12.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:09; 102a9f8f606a9969 (check 0.23s) 2018-11-02 09:23:47 170000/1257787 [13.51%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:09; 6948c87a300c0323 2018-11-02 09:23:52 180000/1257787 [14.31%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:09; a80d8c720e0397a0 2018-11-02 09:23:57 190000/1257787 [15.10%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; ca41166d67c187c0 2018-11-02 09:24:02 200000/1257787 [15.90%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:09; 9c63150ece06844f 2018-11-02 09:24:07 210000/1257787 [16.69%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 32a7bba6c1712924 2018-11-02 09:24:12 220000/1257787 [17.49%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; d56b09504be1099a 2018-11-02 09:24:18 230000/1257787 [18.28%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 3b3f8813a99b3bab 2018-11-02 09:24:23 240000/1257787 [19.08%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 11a270ba2de54e92 2018-11-02 09:24:28 250000/1257787 [19.87%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 85540b740a9392d2 2018-11-02 09:24:33 260000/1257787 [20.67%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 1d9c171966185cb5 2018-11-02 09:24:38 270000/1257787 [21.46%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; e3e8928dce8c4def 2018-11-02 09:24:43 280000/1257787 [22.26%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 6b41102946646d7e 2018-11-02 09:24:48 290000/1257787 [23.05%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 6c7a6eb5feb62a7a 2018-11-02 09:24:54 300000/1257787 [23.85%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; a618fd79e0918820 2018-11-02 09:24:59 310000/1257787 [24.64%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 027b746c63f990ff 2018-11-02 09:25:04 OK 320000/1257787 [25.44%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:08; 836bb30e924e5458 (check 0.23s) 2018-11-02 09:25:09 330000/1257787 [26.23%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; e2bdfedf3209f7c6 2018-11-02 09:25:14 340000/1257787 [27.03%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 33f782cceecaadc5 2018-11-02 09:25:20 350000/1257787 [27.82%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 19ff5285cda8a38a 2018-11-02 09:25:25 360000/1257787 [28.62%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 06a4d584a35f58f0 2018-11-02 09:25:30 370000/1257787 [29.41%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 9e81d8b362ac3f23 2018-11-02 09:25:35 380000/1257787 [30.21%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 8971f1a6c177c1b3 2018-11-02 09:25:40 390000/1257787 [31.00%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:07; 668095c791414457 2018-11-02 09:25:45 400000/1257787 [31.80%], 0.51 ms/it [0.50, 0.56]; ETA 0d 00:07; b148a85bec77cd86 2018-11-02 09:25:51 410000/1257787 [32.59%], 0.52 ms/it [0.51, 0.55]; ETA 0d 00:07; d19e0e57c86152fe 2018-11-02 09:25:56 420000/1257787 [33.39%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:07; a0df442bc41ac61c 2018-11-02 09:26:01 430000/1257787 [34.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 2eed708adb5ebaa9 2018-11-02 09:26:06 440000/1257787 [34.98%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 4d1557a7c215ad9c 2018-11-02 09:26:11 450000/1257787 [35.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; 5af120c5e6d6fd2c 2018-11-02 09:26:16 460000/1257787 [36.57%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 5ecc4128fac1c94d 2018-11-02 09:26:22 470000/1257787 [37.36%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; aa4f13d1d3b1f375 2018-11-02 09:26:27 OK 480000/1257787 [38.16%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; 49a1d8aa8cb20a19 (check 0.23s) 2018-11-02 09:26:32 490000/1257787 [38.95%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; cfe1f6602e7bd39c 2018-11-02 09:26:37 500000/1257787 [39.75%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; d1331bffeb965c61 2018-11-02 09:26:42 510000/1257787 [40.54%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; ef3905d00237a8ff 2018-11-02 09:26:48 520000/1257787 [41.34%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; b3ae3eac3dd84e58 2018-11-02 09:26:53 530000/1257787 [42.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 67b4b7fb8315d59c 2018-11-02 09:26:58 540000/1257787 [42.93%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 3bab9e13ed7beb8b 2018-11-02 09:27:03 550000/1257787 [43.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; e02e0b2c4e8a41ed 2018-11-02 09:27:08 560000/1257787 [44.52%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 9c75b983083b39e7 2018-11-02 09:27:13 570000/1257787 [45.31%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; 2cb5ee3b2cd9cbc2 2018-11-02 09:27:18 580000/1257787 [46.10%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:06; eae845ead78fccf6 2018-11-02 09:27:24 590000/1257787 [46.90%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:06; 6ecd5bddb56b159a 2018-11-02 09:27:29 600000/1257787 [47.69%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; b17567d3c846ec14 2018-11-02 09:27:34 610000/1257787 [48.49%], 0.52 ms/it [0.51, 0.54]; ETA 0d 00:06; c7eee3e600cfd4e4 2018-11-02 09:27:39 620000/1257787 [49.28%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 46da87fbdf1f0ef0 2018-11-02 09:27:44 630000/1257787 [50.08%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; d681dd33c0e26121 2018-11-02 09:27:50 OK 640000/1257787 [50.87%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; 5407ff4901212ebe (check 0.23s) 2018-11-02 09:27:55 650000/1257787 [51.67%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 387f070197d9e625 2018-11-02 09:28:00 660000/1257787 [52.46%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:05; 847d618a9fd8cdbc 2018-11-02 09:28:05 670000/1257787 [53.26%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 34d5fc9e6dc12cf3 2018-11-02 09:28:10 680000/1257787 [54.05%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; 53155109fc3602a6 2018-11-02 09:28:15 690000/1257787 [54.85%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 84a3d821b89aad62 2018-11-02 09:28:20 700000/1257787 [55.64%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; f5068734c454faa8 2018-11-02 09:28:26 710000/1257787 [56.44%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; a4297276b82798c9 2018-11-02 09:28:31 720000/1257787 [57.23%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; ef05cab9b877532b 2018-11-02 09:28:36 730000/1257787 [58.03%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; b5b7dc1a2dd24d29 2018-11-02 09:28:41 740000/1257787 [58.82%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; aee981b6e6a3acf6 2018-11-02 09:28:46 750000/1257787 [59.62%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; d65b0a3817c57abb 2018-11-02 09:28:51 760000/1257787 [60.41%], 0.52 ms/it [0.51, 0.52]; ETA 0d 00:04; b9a3ce8f9a4a5c56 2018-11-02 09:28:56 770000/1257787 [61.21%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 27172fc29aaff8bb 2018-11-02 09:29:02 780000/1257787 [62.00%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 6e1c86d8d421ed40 2018-11-02 09:29:07 790000/1257787 [62.80%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; b7eed42cede94986 2018-11-02 09:29:12 OK 800000/1257787 [63.59%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 1cdb3292873b5558 (check 0.24s) 2018-11-02 09:29:17 810000/1257787 [64.39%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:04; 7d9316a37bfab070 2018-11-02 09:29:22 820000/1257787 [65.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 5382de0521f4d255 2018-11-02 09:29:27 830000/1257787 [65.98%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 8e394efdf7cd9510 2018-11-02 09:29:33 840000/1257787 [66.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:04; 3d7a0759da92dd65 2018-11-02 09:29:38 850000/1257787 [67.57%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 5f23f2eb0f309b92 2018-11-02 09:29:43 860000/1257787 [68.36%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:03; d1fd9defb926037b 2018-11-02 09:29:48 870000/1257787 [69.16%], 0.52 ms/it [0.51, 0.52]; ETA 0d 00:03; ea1263abf2c765e1 2018-11-02 09:29:53 880000/1257787 [69.95%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 6e6a097ea4f341b2 2018-11-02 09:29:58 890000/1257787 [70.75%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; f3b51333a0d553ac 2018-11-02 09:30:04 900000/1257787 [71.54%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 777540db62beb9a6 2018-11-02 09:30:09 910000/1257787 [72.34%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 43fd78454ecc3c50 2018-11-02 09:30:14 920000/1257787 [73.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; d3a89c3b55a4d991 2018-11-02 09:30:19 930000/1257787 [73.93%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 103525740c060472 2018-11-02 09:30:24 940000/1257787 [74.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:03; 93d4eff259dde7bd 2018-11-02 09:30:29 950000/1257787 [75.52%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 4a8c90a1e800cefe 2018-11-02 09:30:35 OK 960000/1257787 [76.31%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:03; 6b998a4097ffd41e (check 0.23s) 2018-11-02 09:30:40 970000/1257787 [77.11%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 5ac82bde294904e8 2018-11-02 09:30:45 980000/1257787 [77.90%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; 1b937113ce55edb0 2018-11-02 09:30:50 990000/1257787 [78.70%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 88518875f92661c5 2018-11-02 09:30:55 1000000/1257787 [79.49%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:02; 498b633d92b92722 2018-11-02 09:31:00 1010000/1257787 [80.29%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 9b84a4a2627af8ca 2018-11-02 09:31:05 1020000/1257787 [81.08%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 09c01bef03a22632 2018-11-02 09:31:11 1030000/1257787 [81.88%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; 86e99dda1efc8dc3 2018-11-02 09:31:16 1040000/1257787 [82.67%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 0167d4f540311245 2018-11-02 09:31:21 1050000/1257787 [83.47%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 54bcffd627fd4732 2018-11-02 09:31:26 1060000/1257787 [84.26%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; e98afd01f99bf64e 2018-11-02 09:31:31 1070000/1257787 [85.06%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; dc6312680e7c5297 2018-11-02 09:31:36 1080000/1257787 [85.85%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:02; 1180f3968def2760 2018-11-02 09:31:42 1090000/1257787 [86.65%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; 9b0be9d76b160ce8 2018-11-02 09:31:47 1100000/1257787 [87.44%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; 0dbdbb536300bdc2 2018-11-02 09:31:52 1110000/1257787 [88.24%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:01; 7e24e68805544414 2018-11-02 09:31:57 OK 1120000/1257787 [89.03%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; e469ae8eb668ee4e (check 0.23s) 2018-11-02 09:32:02 1130000/1257787 [89.83%], 0.51 ms/it [0.51, 0.54]; ETA 0d 00:01; 8c6cfaabfea0b578 2018-11-02 09:32:08 1140000/1257787 [90.62%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; bf4616487bc7da8c 2018-11-02 09:32:13 1150000/1257787 [91.41%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; 7662cf174e295762 2018-11-02 09:32:18 1160000/1257787 [92.21%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; d389c12ee0baf5e3 2018-11-02 09:32:23 1170000/1257787 [93.00%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; fc3823e4ddbe2808 2018-11-02 09:32:28 1180000/1257787 [93.80%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; bdc5dc18c936bb78 2018-11-02 09:32:33 1190000/1257787 [94.59%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:01; f10f9a7452fe8144 2018-11-02 09:32:38 1200000/1257787 [95.39%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; 49adedb139c60c75 2018-11-02 09:32:44 1210000/1257787 [96.18%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; d470b0903ef59a47 2018-11-02 09:32:49 1220000/1257787 [96.98%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:00; 797393e12c2235ce 2018-11-02 09:32:54 1230000/1257787 [97.77%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:00; 67123f8079e5adbd 2018-11-02 09:32:59 1240000/1257787 [98.57%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; 603a4c2d81d7e90a 2018-11-02 09:33:04 1250000/1257787 [99.36%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; a3256d3e3efc58e0 2018-11-02 09:33:08 PP 1257786 / 1257787, 62dc0af77c7b5e45 (base 62dc0af77c7b5e45) 2018-11-02 09:33:09 OK 1258000/1257787 [100.00%], 0.52 ms/it [0.50, 0.59]; ETA 0d 00:00; 03d3485e5f7cdec6 (check 0.23s) 2018-11-02 09:33:09 {"exponent":"1257787", "worktype":"PRP-1", "status":"P", "program":{"name":"gpuowl", "version":"4.3-537c681"}, "timestamp":"2018-11-02 14:33 :09 UTC", "aid":"0", "res64":"62dc0af77c7b5e45", "base":{"b1":"200", "bias":{"2":19}, "res64":"62dc0af77c7b5e45"}} 2018-11-02 09:33:09 Bye[/CODE] |
[QUOTE=kriesel;499346]Similarly to v4.6, commented out an assert in gpu.cpp and it compiled. [/QUOTE]
Thanks, but why do you test 4.3 or 4.6 instead of the more recent 5.x? (Maybe some issues have been fixed in 5.x. And maybe I introduced new bugs in 5.x too.) |
[QUOTE=preda;499381]Thanks, but why do you test 4.3 or 4.6 instead of the more recent 5.x?
(Maybe some issues have been fixed in 5.x. And maybe I introduced new bugs in 5.x too.)[/QUOTE] I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that? Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority. Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL] I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that. I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL. Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower. Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins. |
[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority. Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL] I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that. I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL. Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower. Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE] The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6. |
[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority. Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL] I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that. I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL. Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower. Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE] I have opened a new issue: [url]https://github.com/preda/gpuowl/issues/17[/url] |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.