mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-11-01 22:25

[QUOTE=kriesel;499263]Nonzero (primenet goal) B2 values are being reported as zero.[/QUOTE]

B2=0 in result was a bug, should be fixed now. (just a 'cosmetic' bug, the right B2 was in effect).

[QUOTE]Final residues for primes look odd to me, not a PRP3 of a prime. Is this a side effect of the simultaneous P-1 or what?[/QUOTE]

Yes. The "classic" PRP was done with Base==3 (thus the name "PRP3"). In PRP-1 with B1!=0, the Base is not 3 anymore, it is 3^(2*E*powerSmooth(B1)). For a prime, residue == base.

kriesel 2018-11-01 22:30

Prime check on gpuowl v3.8
 
In 11 minutes on an RX480. This ended the run, although there was more work in the worktodo file. I prefer work continue.

[QUOTE]{"exponent":1257787, "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"3.8-91c52fa-OpenCL"}, "timestamp":"2018-11-01 22:18:22 UTC", "user":"kriesel", "computer":"condorella-rx480", "aid":"0", "residue-type":1, "fft-length":"512K", "res64":"0000000000000001", "errors":{"gerbicz":0}}
[/QUOTE]
V5.0 did not halt, doing one adequately large known prime exponent after another, so that's good.

kriesel 2018-11-01 22:39

gpuowl V4.6-bb691cb
 
1 Attachment(s)
The only minor change I made was commenting out the following in gpu.cpp, so it would compile. For PRP (no tf or P-1), it runs.
// 10/31/18 kwk static_assert(sizeof(long) == 8, "size long");

I tried to get B1 P-1 to run but V4.6 ignored every related worktodo syntax I could find in previous posts.

Early results are V3.3 is slower than V3.5; V3.5 to V3.8 are about equal to each other, and faster than V3.9; V3.9 is usually very slightly faster than V4.6; V4.6 is faster than V5.0. Part of V5.0's lag is switching sooner to larger fft lengths; I selected a list of test exponents to check for GEC issues near the upper limits of fft lengths back at V3.3 and V3.5, according to their approximate limits.

It's not clear what the utility of GCD over 0 points is. Maybe just print statements executing regardless of whether P-1 occurs or not. Sample run:
[CODE]2018-11-01 14:33:33 condorella-rx480 gpuowl 4.6-bb691cb
2018-11-01 14:33:33 condorella-rx480 FFT 20480K: Width 1024 (256x4), Height 2048 (256x8), Middle 5; 17.69 bits/word
2018-11-01 14:33:34 condorella-rx480 Note: using short carry kernels
2018-11-01 14:33:35 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-01 14:33:38 condorella-rx480 OpenCL compilation in 3229 ms, with "-DEXP=371000039u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=5u -I. -cl-fast-relaxed-
math -cl-std=CL2.0 "
2018-11-01 14:33:41 condorella-rx480 PRP M(371000039), FFT 20480K, 17.69 bits/word, B1 0
2018-11-01 14:33:51 condorella-rx480 OK loaded: 0/371000039, B1 0, blockSize 400, 0000000000000003 (expected 0000000000000003)
2018-11-01 14:33:51 condorella-rx480 Selected 0 P-1 trial points
2018-11-01 14:34:16 condorella-rx480 OK 800/371000039 [ 0.00%], 19.23 ms/it [19.23, 19.23]; ETA 82d 13:15; e2ac5ec2a6819689 (check 9.19s)
2018-11-01 14:37:14 condorella-rx480 10000/371000039 [ 0.00%], 19.34 ms/it [19.24, 20.98]; ETA 83d 01:23; edb6fd2abeb3f16c
2018-11-01 14:40:28 condorella-rx480 20000/371000039 [ 0.01%], 19.38 ms/it [19.29, 20.98]; ETA 83d 05:29; 8d715d688ed88463
2018-11-01 14:40:43 condorella-rx480 Stopping, please wait..
2018-11-01 14:40:52 condorella-rx480 OK 20800/371000039 [ 0.01%], 19.17 ms/it [18.99, 19.34]; ETA 82d 07:13; d6286177c0386521 (check 9.41s)
2018-11-01 14:40:52 condorella-rx480 Starting GCD over 0 points
2018-11-01 14:40:53 condorella-rx480 Waiting for GCD to finish..
2018-11-01 14:40:53 condorella-rx480 Exiting because "stop requested"
2018-11-01 14:40:53 condorella-rx480 Bye[/CODE]

kriesel 2018-11-01 23:24

gpuowl V5.0-f604bb1 for Win7-x64 AMD opencl 2.0
 
1 Attachment(s)
This is the result of the Oct 31 commit and does not have the prp3 P fix or B2!=0 fix. It's been a long day getting to this point. Perhaps someone could beat on this code and find an issue or two that have eluded us so far, and report any to Preda or here in this thread.

kriesel 2018-11-01 23:45

Some benchmarking, V3.x, 4.6, 5.0
 
Today's benchmarking results on RX480 in Windows for V4.6 and 5.0 and a bit more in 3.x have been added. See the second attachment at [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL]

kriesel 2018-11-02 01:08

gpuowl V5.0-f604bb1 for Win7-x64 AMD opencl 2.0
 
1 Attachment(s)
OOps, had too many "manage attachments" windows open and the wrong one posted in post 829.

_This_ attachment hopefully is the result of the Oct 31 V5.0 commit and does not have the prp3 P fix or B2!=0 fix. It's been a long day getting to this point. Perhaps someone could beat on this code and find an issue or two that have eluded us so far, and report any to Preda or here in this thread.

kriesel 2018-11-02 14:34

openowl V4.3-537c681
 
1 Attachment(s)
Similarly to v4.6, commented out an assert in gpu.cpp and it compiled. The only testing I've done on this was a run with -B1 200 on the command line of m1257787.
[CODE]C:\msys64\home\ken\gpuowl-compile\v4.3>openowl -B1 200
2018-11-02 09:22:12 gpuowl 4.3-537c681
2018-11-02 09:22:12 FFT 512K: Width 512 (64x8), Height 512 (64x8); 2.40 bits/word
2018-11-02 09:22:12 Note: using long carry kernels
2018-11-02 09:22:13 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-11-02 09:22:17 OpenCL compilation in 3861 ms, with "-DEXP=1257787u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 "
2018-11-02 09:22:17 PRP M(1257787), FFT 512K, 2.40 bits/word, B1 200
2018-11-02 09:22:17 Starting P-1 first-stage GCD
2018-11-02 09:22:18 OK loaded: 0/1257787, B1 200, blockSize 400, 62dc0af77c7b5e45 (expected 62dc0af77c7b5e45)
2018-11-02 09:22:19 GCD says: still no factor
2018-11-02 09:22:19 OK 800/1257787 [ 0.06%], 1.70 ms/it [0.53, 2.87]; ETA 0d 00:36; 49283f488d1734a4 (check 0.23s)
2018-11-02 09:22:24 10000/1257787 [ 0.79%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:11; 8093e635acf1cfb6
2018-11-02 09:22:29 20000/1257787 [ 1.59%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:11; f876ff6c60cfdefb
2018-11-02 09:22:34 30000/1257787 [ 2.38%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:11; 2e84cedbb7da818a
2018-11-02 09:22:39 40000/1257787 [ 3.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 71f45d92aae292bd
2018-11-02 09:22:45 50000/1257787 [ 3.97%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 1fd28086ab6826a0
2018-11-02 09:22:50 60000/1257787 [ 4.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; c2b180c2f9b281b7
2018-11-02 09:22:55 70000/1257787 [ 5.56%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 22138c0bf6a96a9a
2018-11-02 09:23:00 80000/1257787 [ 6.36%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 7ae80137efaa8689
2018-11-02 09:23:05 90000/1257787 [ 7.15%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; e97c39f5383fe107
2018-11-02 09:23:10 100000/1257787 [ 7.95%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; e34f0813e6847496
2018-11-02 09:23:15 110000/1257787 [ 8.74%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 1f145f046a7eaba7
2018-11-02 09:23:21 120000/1257787 [ 9.54%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; 056ef16631c69eaa
2018-11-02 09:23:26 130000/1257787 [10.33%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:10; b819516ee5b773d8
2018-11-02 09:23:31 140000/1257787 [11.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:10; 4ec0e61c87765e83
2018-11-02 09:23:36 150000/1257787 [11.92%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:10; 4afe585b6826d6ea
2018-11-02 09:23:41 OK 160000/1257787 [12.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:09; 102a9f8f606a9969 (check 0.23s)
2018-11-02 09:23:47 170000/1257787 [13.51%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:09; 6948c87a300c0323
2018-11-02 09:23:52 180000/1257787 [14.31%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:09; a80d8c720e0397a0
2018-11-02 09:23:57 190000/1257787 [15.10%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; ca41166d67c187c0
2018-11-02 09:24:02 200000/1257787 [15.90%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:09; 9c63150ece06844f
2018-11-02 09:24:07 210000/1257787 [16.69%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 32a7bba6c1712924
2018-11-02 09:24:12 220000/1257787 [17.49%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; d56b09504be1099a
2018-11-02 09:24:18 230000/1257787 [18.28%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 3b3f8813a99b3bab
2018-11-02 09:24:23 240000/1257787 [19.08%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 11a270ba2de54e92
2018-11-02 09:24:28 250000/1257787 [19.87%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:09; 85540b740a9392d2
2018-11-02 09:24:33 260000/1257787 [20.67%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 1d9c171966185cb5
2018-11-02 09:24:38 270000/1257787 [21.46%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; e3e8928dce8c4def
2018-11-02 09:24:43 280000/1257787 [22.26%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 6b41102946646d7e
2018-11-02 09:24:48 290000/1257787 [23.05%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 6c7a6eb5feb62a7a
2018-11-02 09:24:54 300000/1257787 [23.85%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; a618fd79e0918820
2018-11-02 09:24:59 310000/1257787 [24.64%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 027b746c63f990ff
2018-11-02 09:25:04 OK 320000/1257787 [25.44%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:08; 836bb30e924e5458 (check 0.23s)
2018-11-02 09:25:09 330000/1257787 [26.23%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; e2bdfedf3209f7c6
2018-11-02 09:25:14 340000/1257787 [27.03%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 33f782cceecaadc5
2018-11-02 09:25:20 350000/1257787 [27.82%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 19ff5285cda8a38a
2018-11-02 09:25:25 360000/1257787 [28.62%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 06a4d584a35f58f0
2018-11-02 09:25:30 370000/1257787 [29.41%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 9e81d8b362ac3f23
2018-11-02 09:25:35 380000/1257787 [30.21%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:08; 8971f1a6c177c1b3
2018-11-02 09:25:40 390000/1257787 [31.00%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:07; 668095c791414457
2018-11-02 09:25:45 400000/1257787 [31.80%], 0.51 ms/it [0.50, 0.56]; ETA 0d 00:07; b148a85bec77cd86
2018-11-02 09:25:51 410000/1257787 [32.59%], 0.52 ms/it [0.51, 0.55]; ETA 0d 00:07; d19e0e57c86152fe
2018-11-02 09:25:56 420000/1257787 [33.39%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:07; a0df442bc41ac61c
2018-11-02 09:26:01 430000/1257787 [34.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 2eed708adb5ebaa9
2018-11-02 09:26:06 440000/1257787 [34.98%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 4d1557a7c215ad9c
2018-11-02 09:26:11 450000/1257787 [35.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; 5af120c5e6d6fd2c
2018-11-02 09:26:16 460000/1257787 [36.57%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; 5ecc4128fac1c94d
2018-11-02 09:26:22 470000/1257787 [37.36%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; aa4f13d1d3b1f375
2018-11-02 09:26:27 OK 480000/1257787 [38.16%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; 49a1d8aa8cb20a19 (check 0.23s)
2018-11-02 09:26:32 490000/1257787 [38.95%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:07; cfe1f6602e7bd39c
2018-11-02 09:26:37 500000/1257787 [39.75%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:07; d1331bffeb965c61
2018-11-02 09:26:42 510000/1257787 [40.54%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; ef3905d00237a8ff
2018-11-02 09:26:48 520000/1257787 [41.34%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; b3ae3eac3dd84e58
2018-11-02 09:26:53 530000/1257787 [42.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 67b4b7fb8315d59c
2018-11-02 09:26:58 540000/1257787 [42.93%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 3bab9e13ed7beb8b
2018-11-02 09:27:03 550000/1257787 [43.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; e02e0b2c4e8a41ed
2018-11-02 09:27:08 560000/1257787 [44.52%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:06; 9c75b983083b39e7
2018-11-02 09:27:13 570000/1257787 [45.31%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; 2cb5ee3b2cd9cbc2
2018-11-02 09:27:18 580000/1257787 [46.10%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:06; eae845ead78fccf6
2018-11-02 09:27:24 590000/1257787 [46.90%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:06; 6ecd5bddb56b159a
2018-11-02 09:27:29 600000/1257787 [47.69%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:06; b17567d3c846ec14
2018-11-02 09:27:34 610000/1257787 [48.49%], 0.52 ms/it [0.51, 0.54]; ETA 0d 00:06; c7eee3e600cfd4e4
2018-11-02 09:27:39 620000/1257787 [49.28%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 46da87fbdf1f0ef0
2018-11-02 09:27:44 630000/1257787 [50.08%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; d681dd33c0e26121
2018-11-02 09:27:50 OK 640000/1257787 [50.87%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; 5407ff4901212ebe (check 0.23s)
2018-11-02 09:27:55 650000/1257787 [51.67%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 387f070197d9e625
2018-11-02 09:28:00 660000/1257787 [52.46%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:05; 847d618a9fd8cdbc
2018-11-02 09:28:05 670000/1257787 [53.26%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 34d5fc9e6dc12cf3
2018-11-02 09:28:10 680000/1257787 [54.05%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:05; 53155109fc3602a6
2018-11-02 09:28:15 690000/1257787 [54.85%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; 84a3d821b89aad62
2018-11-02 09:28:20 700000/1257787 [55.64%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; f5068734c454faa8
2018-11-02 09:28:26 710000/1257787 [56.44%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; a4297276b82798c9
2018-11-02 09:28:31 720000/1257787 [57.23%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; ef05cab9b877532b
2018-11-02 09:28:36 730000/1257787 [58.03%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:05; b5b7dc1a2dd24d29
2018-11-02 09:28:41 740000/1257787 [58.82%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; aee981b6e6a3acf6
2018-11-02 09:28:46 750000/1257787 [59.62%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; d65b0a3817c57abb
2018-11-02 09:28:51 760000/1257787 [60.41%], 0.52 ms/it [0.51, 0.52]; ETA 0d 00:04; b9a3ce8f9a4a5c56
2018-11-02 09:28:56 770000/1257787 [61.21%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 27172fc29aaff8bb
2018-11-02 09:29:02 780000/1257787 [62.00%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 6e1c86d8d421ed40
2018-11-02 09:29:07 790000/1257787 [62.80%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; b7eed42cede94986
2018-11-02 09:29:12 OK 800000/1257787 [63.59%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 1cdb3292873b5558 (check 0.24s)
2018-11-02 09:29:17 810000/1257787 [64.39%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:04; 7d9316a37bfab070
2018-11-02 09:29:22 820000/1257787 [65.18%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 5382de0521f4d255
2018-11-02 09:29:27 830000/1257787 [65.98%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:04; 8e394efdf7cd9510
2018-11-02 09:29:33 840000/1257787 [66.77%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:04; 3d7a0759da92dd65
2018-11-02 09:29:38 850000/1257787 [67.57%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 5f23f2eb0f309b92
2018-11-02 09:29:43 860000/1257787 [68.36%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:03; d1fd9defb926037b
2018-11-02 09:29:48 870000/1257787 [69.16%], 0.52 ms/it [0.51, 0.52]; ETA 0d 00:03; ea1263abf2c765e1
2018-11-02 09:29:53 880000/1257787 [69.95%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 6e6a097ea4f341b2
2018-11-02 09:29:58 890000/1257787 [70.75%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; f3b51333a0d553ac
2018-11-02 09:30:04 900000/1257787 [71.54%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 777540db62beb9a6
2018-11-02 09:30:09 910000/1257787 [72.34%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 43fd78454ecc3c50
2018-11-02 09:30:14 920000/1257787 [73.13%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; d3a89c3b55a4d991
2018-11-02 09:30:19 930000/1257787 [73.93%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 103525740c060472
2018-11-02 09:30:24 940000/1257787 [74.72%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:03; 93d4eff259dde7bd
2018-11-02 09:30:29 950000/1257787 [75.52%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:03; 4a8c90a1e800cefe
2018-11-02 09:30:35 OK 960000/1257787 [76.31%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:03; 6b998a4097ffd41e (check 0.23s)
2018-11-02 09:30:40 970000/1257787 [77.11%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 5ac82bde294904e8
2018-11-02 09:30:45 980000/1257787 [77.90%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; 1b937113ce55edb0
2018-11-02 09:30:50 990000/1257787 [78.70%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 88518875f92661c5
2018-11-02 09:30:55 1000000/1257787 [79.49%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:02; 498b633d92b92722
2018-11-02 09:31:00 1010000/1257787 [80.29%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 9b84a4a2627af8ca
2018-11-02 09:31:05 1020000/1257787 [81.08%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 09c01bef03a22632
2018-11-02 09:31:11 1030000/1257787 [81.88%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; 86e99dda1efc8dc3
2018-11-02 09:31:16 1040000/1257787 [82.67%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 0167d4f540311245
2018-11-02 09:31:21 1050000/1257787 [83.47%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; 54bcffd627fd4732
2018-11-02 09:31:26 1060000/1257787 [84.26%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:02; e98afd01f99bf64e
2018-11-02 09:31:31 1070000/1257787 [85.06%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:02; dc6312680e7c5297
2018-11-02 09:31:36 1080000/1257787 [85.85%], 0.51 ms/it [0.51, 0.53]; ETA 0d 00:02; 1180f3968def2760
2018-11-02 09:31:42 1090000/1257787 [86.65%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; 9b0be9d76b160ce8
2018-11-02 09:31:47 1100000/1257787 [87.44%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; 0dbdbb536300bdc2
2018-11-02 09:31:52 1110000/1257787 [88.24%], 0.51 ms/it [0.51, 0.52]; ETA 0d 00:01; 7e24e68805544414
2018-11-02 09:31:57 OK 1120000/1257787 [89.03%], 0.52 ms/it [0.51, 0.53]; ETA 0d 00:01; e469ae8eb668ee4e (check 0.23s)
2018-11-02 09:32:02 1130000/1257787 [89.83%], 0.51 ms/it [0.51, 0.54]; ETA 0d 00:01; 8c6cfaabfea0b578
2018-11-02 09:32:08 1140000/1257787 [90.62%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; bf4616487bc7da8c
2018-11-02 09:32:13 1150000/1257787 [91.41%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; 7662cf174e295762
2018-11-02 09:32:18 1160000/1257787 [92.21%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; d389c12ee0baf5e3
2018-11-02 09:32:23 1170000/1257787 [93.00%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; fc3823e4ddbe2808
2018-11-02 09:32:28 1180000/1257787 [93.80%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:01; bdc5dc18c936bb78
2018-11-02 09:32:33 1190000/1257787 [94.59%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:01; f10f9a7452fe8144
2018-11-02 09:32:38 1200000/1257787 [95.39%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; 49adedb139c60c75
2018-11-02 09:32:44 1210000/1257787 [96.18%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; d470b0903ef59a47
2018-11-02 09:32:49 1220000/1257787 [96.98%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:00; 797393e12c2235ce
2018-11-02 09:32:54 1230000/1257787 [97.77%], 0.52 ms/it [0.50, 0.55]; ETA 0d 00:00; 67123f8079e5adbd
2018-11-02 09:32:59 1240000/1257787 [98.57%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; 603a4c2d81d7e90a
2018-11-02 09:33:04 1250000/1257787 [99.36%], 0.51 ms/it [0.50, 0.55]; ETA 0d 00:00; a3256d3e3efc58e0
2018-11-02 09:33:08 PP 1257786 / 1257787, 62dc0af77c7b5e45 (base 62dc0af77c7b5e45)
2018-11-02 09:33:09 OK 1258000/1257787 [100.00%], 0.52 ms/it [0.50, 0.59]; ETA 0d 00:00; 03d3485e5f7cdec6 (check 0.23s)
2018-11-02 09:33:09 {"exponent":"1257787", "worktype":"PRP-1", "status":"P", "program":{"name":"gpuowl", "version":"4.3-537c681"}, "timestamp":"2018-11-02 14:33
:09 UTC", "aid":"0", "res64":"62dc0af77c7b5e45", "base":{"b1":"200", "bias":{"2":19}, "res64":"62dc0af77c7b5e45"}}
2018-11-02 09:33:09 Bye[/CODE]

preda 2018-11-02 21:16

[QUOTE=kriesel;499346]Similarly to v4.6, commented out an assert in gpu.cpp and it compiled. [/QUOTE]

Thanks, but why do you test 4.3 or 4.6 instead of the more recent 5.x?

(Maybe some issues have been fixed in 5.x. And maybe I introduced new bugs in 5.x too.)

kriesel 2018-11-03 04:31

[QUOTE=preda;499381]Thanks, but why do you test 4.3 or 4.6 instead of the more recent 5.x?

(Maybe some issues have been fixed in 5.x. And maybe I introduced new bugs in 5.x too.)[/QUOTE]
I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority.
Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL]

I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that.
I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL.

Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower.
Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.

SELROC 2018-11-03 07:54

[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority.
Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL]

I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that.
I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL.

Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower.
Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE]


The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6.

SELROC 2018-11-03 08:34

[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority.
Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL]

I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that.
I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL.

Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower.
Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE]


I have opened a new issue:


[url]https://github.com/preda/gpuowl/issues/17[/url]


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.