mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-06-01, 05:27   #2762
DrobinsonPE
 
Aug 2020

5·31 Posts
Default

RX 6500 XT (XFX model RX-65XT4D)

Because some people said it was a terrible card but it was cheap enough to splurge on.
Code:
C:\Users\User\GPUOWL\v611380>gpuowl-win -iters 200000 -prp 77936867
2022-05-31 21:55:52 gpuowl v6.11-380-g79ea0cc
2022-05-31 21:55:52 config: -log 10000 -device 1
2022-05-31 21:55:52 config: -iters 200000 -prp 77936867
2022-05-31 21:55:52 device 1, unique id ''
2022-05-31 21:55:52 gfx1034-1 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
2022-05-31 21:55:52 gfx1034-1 Expected maximum carry32: 583B0000
2022-05-31 21:55:52 gfx1034-1 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DPM1=0 -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xa.c42d0d7cec038p-5 -DIWEIGHT_STEP_MINUS_1=-0x8.0e50c8817ddf8p-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2022-05-31 21:55:55 gfx1034-1 OpenCL compilation in 2.40 s
2022-05-31 21:55:56 gfx1034-1 77936867 OK        0 loaded: blockSize 400, 0000000000000003
2022-05-31 21:55:56 gfx1034-1 validating proof residues for power 8
2022-05-31 21:55:56 gfx1034-1 Proof using power 8
2022-05-31 21:56:00 gfx1034-1 77936867 OK      800   0.00%; 3166 us/it; ETA 2d 20:32; 1579c241dc63eca6 (check 1.31s)
2022-05-31 21:56:30 gfx1034-1 77936867 OK    10000   0.01%; 3164 us/it; ETA 2d 20:29; fc4f135f7cf4ad29 (check 1.31s)
2022-05-31 21:57:03 gfx1034-1 77936867 OK    20000   0.03%; 3163 us/it; ETA 2d 20:27; 3cd1bd9d5e09cbc5 (check 1.31s)
2022-05-31 21:57:36 gfx1034-1 77936867 OK    30000   0.04%; 3162 us/it; ETA 2d 20:26; c4e0ff35e3290d98 (check 1.31s)
2022-05-31 21:58:09 gfx1034-1 77936867 OK    40000   0.05%; 3162 us/it; ETA 2d 20:25; dffe1b1b0d748128 (check 1.31s)
2022-05-31 21:58:42 gfx1034-1 77936867 OK    50000   0.06%; 3162 us/it; ETA 2d 20:25; 52e286945371ed29 (check 1.31s)
2022-05-31 21:59:15 gfx1034-1 77936867 OK    60000   0.08%; 3161 us/it; ETA 2d 20:23; 0945da4dc08bdd95 (check 1.31s)
2022-05-31 21:59:48 gfx1034-1 77936867 OK    70000   0.09%; 3161 us/it; ETA 2d 20:22; 7131fa4eb77f4bb2 (check 1.30s)
2022-05-31 22:00:21 gfx1034-1 77936867 OK    80000   0.10%; 3161 us/it; ETA 2d 20:22; 8d76071d27ee4221 (check 1.31s)
2022-05-31 22:00:54 gfx1034-1 77936867 OK    90000   0.12%; 3161 us/it; ETA 2d 20:21; 0bacff453b2f470e (check 1.31s)
2022-05-31 22:01:27 gfx1034-1 77936867 OK   100000   0.13%; 3160 us/it; ETA 2d 20:20; 6d7296b9e2830f50 (check 1.30s)
2022-05-31 22:02:00 gfx1034-1 77936867 OK   110000   0.14%; 3161 us/it; ETA 2d 20:20; 8cbfd4435622bda7 (check 1.31s)
2022-05-31 22:02:33 gfx1034-1 77936867 OK   120000   0.15%; 3161 us/it; ETA 2d 20:20; 79ae5dad855057ad (check 1.31s)
2022-05-31 22:03:05 gfx1034-1 77936867 OK   130000   0.17%; 3162 us/it; ETA 2d 20:20; 50c97bcbf876231f (check 1.30s)
2022-05-31 22:03:38 gfx1034-1 77936867 OK   140000   0.18%; 3160 us/it; ETA 2d 20:17; e1db15f897271496 (check 1.31s)
2022-05-31 22:04:11 gfx1034-1 77936867 OK   150000   0.19%; 3160 us/it; ETA 2d 20:17; 127631386c6a9b17 (check 1.31s)
2022-05-31 22:04:44 gfx1034-1 77936867 OK   160000   0.21%; 3160 us/it; ETA 2d 20:16; 25b7b6206fc6f085 (check 1.31s)
2022-05-31 22:05:17 gfx1034-1 77936867 OK   170000   0.22%; 3160 us/it; ETA 2d 20:16; 416816b0d9f4bba8 (check 1.31s)
2022-05-31 22:05:50 gfx1034-1 77936867 OK   180000   0.23%; 3160 us/it; ETA 2d 20:15; 6bee5d054f770861 (check 1.31s)
2022-05-31 22:06:23 gfx1034-1 77936867 OK   190000   0.24%; 3160 us/it; ETA 2d 20:14; f37f068f014b18a0 (check 1.30s)
2022-05-31 22:06:53 gfx1034-1 Stopping, please wait..
2022-05-31 22:06:56 gfx1034-1 77936867 OK   200000   0.26%; 3159 us/it; ETA 2d 20:13; f0b04b45b0855bd2 (check 1.31s)
2022-05-31 22:06:56 gfx1034-1 Exiting because "stop requested"
2022-05-31 22:06:56 gfx1034-1 Bye

Power draw at the plug using a Kill A Watt = 130 w
Attached Thumbnails
Click image for larger version

Name:	RX-6500 XT Graphics Card .png
Views:	72
Size:	56.8 KB
ID:	26947   Click image for larger version

Name:	RX-6500 XT Sensors .png
Views:	74
Size:	36.7 KB
ID:	26948  
DrobinsonPE is offline   Reply With Quote
Old 2022-06-02, 13:53   #2763
DrobinsonPE
 
Aug 2020

5×31 Posts
Default

Quote:
Originally Posted by DrobinsonPE View Post
RX 6500 XT (XFX model RX-65XT4D)
Preliminary efficiency testing by adjusting the GPU frequency while testing a 113.8M exponent and inferring the power used by the GPU (I know how much power the computer was using at idle and when prime 95 was running prior to installing the GPU).

GPU at 100% frequency: 4842us/it = about 78.3 GHz-d/day with GPU power use of about 100 w.
GPU at 80% frequency: 5352us/it = about 70.9 GHz-d/day with GPU power use of about 50 w.

80% was the most efficient setting.
DrobinsonPE is offline   Reply With Quote
Old 2022-06-02, 20:15   #2764
moebius
 
moebius's Avatar
 
Jul 2009
Germany

643 Posts
Default

Update
gpuOwl benchmarks online.xlsx

Last fiddled with by moebius on 2022-06-02 at 20:16
moebius is offline   Reply With Quote
Old 2022-06-10, 23:30   #2765
Runtime Error
 
Sep 2017
USA

3×61 Posts
Default RX 6950 XT

I recently got a shiny new toy: SAPPHIRE NITRO+ Radeon RX 6950 XT 16GB GDDR6 PCI Express 4.0 ATX Video Card

Out of the box @ the default 2669MHZ core clock, showing ~315 watts via GPU-Z (for the chip, I think?):

Code:
20220610 18:12:26 GpuOwl VERSION v7.2-93-ga5402c5-dirty
20220610 18:12:26 config: -maxAlloc 16000
20220610 18:12:26 config: -proof 10
20220610 18:12:26 config: -prp 77936867
20220610 18:12:26 config: -iters 400000
20220610 18:12:26 device 0, unique id ''
20220610 18:12:26 rx6950xt 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20220610 18:12:27 rx6950xt 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20220610 18:12:35 rx6950xt 77936867 OpenCL compilation in 7.31 s
20220610 18:12:35 rx6950xt 77936867 maxAlloc: 15.6 GB
20220610 18:12:35 rx6950xt 77936867 P1(0) 0 bits
20220610 18:12:36 rx6950xt 77936867 OK    200000 on-load: blockSize 400, f0b04b45b0855bd2
20220610 18:12:36 rx6950xt 77936867 validating proof residues for power 10
20220610 18:12:36 rx6950xt 77936867 Proof using power 10
20220610 18:12:37 rx6950xt 77936867 OK    200800   0.26% 895b034c5473a608  559 us/it + check 0.34s + save 0.16s; ETA 12:04
20220610 18:12:42 rx6950xt 77936867    210000 43eb2fc2424d8aac  559
20220610 18:12:47 rx6950xt 77936867    220000 a1081c6dc6a7689f  560
20220610 18:12:53 rx6950xt 77936867    230000 2387818d3d3d0d01  586
20220610 18:12:59 rx6950xt 77936867    240000 a9deae45055e5216  562
20220610 18:13:05 rx6950xt 77936867    250000 89fcab15218f7cac  562
20220610 18:13:10 rx6950xt 77936867    260000 55da428da4cf928a  563
20220610 18:13:16 rx6950xt 77936867    270000 dc349756c5f05abf  563
20220610 18:13:21 rx6950xt 77936867    280000 3564af24488443f4  563
20220610 18:13:27 rx6950xt 77936867    290000 63fb281a06f78198  564
20220610 18:13:33 rx6950xt 77936867    300000 990aa099aad5bf9c  564
20220610 18:13:39 rx6950xt 77936867    310000 61e14297a2cc0096  590
20220610 18:13:44 rx6950xt 77936867    320000 37e630c5f956cf8a  565
20220610 18:13:50 rx6950xt 77936867    330000 66ccde7e28ce2b33  564
20220610 18:13:56 rx6950xt 77936867    340000 d4a7cff61adaa84e  565
20220610 18:14:01 rx6950xt 77936867    350000 d57b659cc1ca2753  565
20220610 18:14:07 rx6950xt 77936867    360000 992df79b843f90de  565
20220610 18:14:13 rx6950xt 77936867    370000 10b0b99eba490a1e  565
20220610 18:14:18 rx6950xt 77936867    380000 56b1e40cd2666109  565
20220610 18:14:24 rx6950xt 77936867    390000 ecccd874a8a0d961  591
20220610 18:14:30 rx6950xt 77936867 OK    400000   0.51% c03f94396a5aa29e  566 us/it + check 0.34s + save 0.17s; ETA 12:11
20220610 18:14:36 rx6950xt 77936867    410000 bf44242560060429  565
20220610 18:14:42 rx6950xt 77936867    420000 7f656173fb521927  566
20220610 18:14:47 rx6950xt 77936867    430000 b6c7618e60b71bb7  566
20220610 18:14:53 rx6950xt 77936867    440000 200cbc3b887fcfb2  566
20220610 18:14:59 rx6950xt 77936867    450000 4811fce58ccc9cab  566
20220610 18:15:04 rx6950xt 77936867    460000 15cb337858fe7eb1  592
20220610 18:15:10 rx6950xt 77936867    470000 ad96b1f48c8bf011  566
20220610 18:15:16 rx6950xt 77936867    480000 93e184ebad6d3cd4  566
20220610 18:15:21 rx6950xt 77936867    490000 70160baec6378071  566
20220610 18:15:27 rx6950xt 77936867    500000 591eecd8448042ad  566
20220610 18:15:33 rx6950xt 77936867    510000 8afd187213816739  566
20220610 18:15:38 rx6950xt 77936867    520000 74e993308f33ac5b  566
20220610 18:15:44 rx6950xt 77936867    530000 57c0f9c504186096  566
20220610 18:15:50 rx6950xt 77936867    540000 bcb42100a7c391ad  592
20220610 18:15:56 rx6950xt 77936867    550000 ff6f9c39e0347941  566
20220610 18:16:01 rx6950xt 77936867    560000 9a740c005539ec84  566
20220610 18:16:07 rx6950xt 77936867    570000 a82132fa0e95b673  566
20220610 18:16:13 rx6950xt 77936867    580000 200fc0c1347e2854  566
20220610 18:16:18 rx6950xt 77936867    590000 48edfb50a88114d1  566
20220610 18:16:24 rx6950xt 77936867 Stopping, please wait..
20220610 18:16:25 rx6950xt 77936867 OK    600000   0.77% b9decd65ca71b629  567 us/it + check 0.33s + save 0.17s; ETA 12:11
20220610 18:16:25 rx6950xt Exiting because "stop requested"
20220610 18:16:25 rx6950xt Bye
Underclocked to 2300MHz, where GPU-Z says the chip draw is ~190ish watts:

Code:
20220610 18:26:21 GpuOwl VERSION v7.2-93-ga5402c5-dirty
20220610 18:26:21 config: -maxAlloc 16000
20220610 18:26:21 config: -proof 10
20220610 18:26:21 config: -prp 77936867
20220610 18:26:21 config: -iters 400000
20220610 18:26:21 device 0, unique id ''
20220610 18:26:21 rx6950xt 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20220610 18:26:21 rx6950xt 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20220610 18:26:29 rx6950xt 77936867 OpenCL compilation in 7.34 s
20220610 18:26:29 rx6950xt 77936867 maxAlloc: 15.6 GB
20220610 18:26:30 rx6950xt 77936867 P1(0) 0 bits
20220610 18:26:30 rx6950xt 77936867 PRP starting from beginning
20220610 18:26:30 rx6950xt 77936867 OK         0 on-load: blockSize 400, 0000000000000003
20220610 18:26:30 rx6950xt 77936867 validating proof residues for power 10
20220610 18:26:30 rx6950xt 77936867 Proof using power 10
20220610 18:26:31 rx6950xt 77936867 OK       800   0.00% 1579c241dc63eca6  647 us/it + check 0.37s + save 0.16s; ETA 14:00
20220610 18:26:37 rx6950xt 77936867     10000 fc4f135f7cf4ad29  646
20220610 18:26:43 rx6950xt 77936867     20000 3cd1bd9d5e09cbc5  645
20220610 18:26:50 rx6950xt 77936867     30000 c4e0ff35e3290d98  646
20220610 18:26:56 rx6950xt 77936867     40000 dffe1b1b0d748128  645
20220610 18:27:03 rx6950xt 77936867     50000 52e286945371ed29  645
20220610 18:27:09 rx6950xt 77936867     60000 0945da4dc08bdd95  645
20220610 18:27:16 rx6950xt 77936867     70000 7131fa4eb77f4bb2  645
20220610 18:27:22 rx6950xt 77936867     80000 8d76071d27ee4221  671
20220610 18:27:29 rx6950xt 77936867     90000 0bacff453b2f470e  645
20220610 18:27:35 rx6950xt 77936867    100000 6d7296b9e2830f50  645
20220610 18:27:42 rx6950xt 77936867    110000 8cbfd4435622bda7  645
20220610 18:27:48 rx6950xt 77936867    120000 79ae5dad855057ad  645
20220610 18:27:55 rx6950xt 77936867    130000 50c97bcbf876231f  645
20220610 18:28:01 rx6950xt 77936867    140000 e1db15f897271496  645
20220610 18:28:07 rx6950xt 77936867    150000 127631386c6a9b17  645
20220610 18:28:14 rx6950xt 77936867    160000 25b7b6206fc6f085  685
20220610 18:28:21 rx6950xt 77936867    170000 416816b0d9f4bba8  645
20220610 18:28:27 rx6950xt 77936867    180000 6bee5d054f770861  645
20220610 18:28:34 rx6950xt 77936867    190000 f37f068f014b18a0  645
20220610 18:28:41 rx6950xt 77936867 OK    200000   0.26% f0b04b45b0855bd2  645 us/it + check 0.37s + save 0.17s; ETA 13:56
20220610 18:28:47 rx6950xt 77936867    210000 43eb2fc2424d8aac  645
20220610 18:28:54 rx6950xt 77936867    220000 a1081c6dc6a7689f  645
20220610 18:29:00 rx6950xt 77936867    230000 2387818d3d3d0d01  671
20220610 18:29:07 rx6950xt 77936867    240000 a9deae45055e5216  645
20220610 18:29:13 rx6950xt 77936867    250000 89fcab15218f7cac  645
20220610 18:29:20 rx6950xt 77936867    260000 55da428da4cf928a  645
20220610 18:29:26 rx6950xt 77936867    270000 dc349756c5f05abf  645
20220610 18:29:33 rx6950xt 77936867    280000 3564af24488443f4  645
20220610 18:29:39 rx6950xt 77936867    290000 63fb281a06f78198  645
20220610 18:29:45 rx6950xt 77936867    300000 990aa099aad5bf9c  645
20220610 18:29:52 rx6950xt 77936867    310000 61e14297a2cc0096  671
20220610 18:29:59 rx6950xt 77936867    320000 37e630c5f956cf8a  645
20220610 18:30:05 rx6950xt 77936867    330000 66ccde7e28ce2b33  645
20220610 18:30:11 rx6950xt 77936867    340000 d4a7cff61adaa84e  645
20220610 18:30:18 rx6950xt 77936867    350000 d57b659cc1ca2753  645
20220610 18:30:24 rx6950xt 77936867    360000 992df79b843f90de  645
20220610 18:30:31 rx6950xt 77936867    370000 10b0b99eba490a1e  645
20220610 18:30:37 rx6950xt 77936867    380000 56b1e40cd2666109  645
20220610 18:30:44 rx6950xt 77936867    390000 ecccd874a8a0d961  671
20220610 18:30:50 rx6950xt 77936867 Stopping, please wait..
20220610 18:30:51 rx6950xt 77936867 OK    400000   0.51% c03f94396a5aa29e  645 us/it + check 0.37s + save 0.17s; ETA 13:54
20220610 18:30:51 rx6950xt Exiting because "stop requested"
20220610 18:30:51 rx6950xt Bye
I do not know how to tune gpuowl myself, but when I've tried using the same commands that DrDerpenberg used in post #2732, it eventually throws an EE instead an OK. Would someone please educate me on tuning gpuowl? The readme says "see the #if tests in gpuowl.cl", but I am not sure where to find gpuowl.cl. I have attempted to follow Kriesel's documentation here and the discussion here, but quite frankly I am lost. Any help would be greatly appreciated! Thank you

PS: I have been away for too long!
Runtime Error is offline   Reply With Quote
Old 2022-06-12, 13:32   #2766
Zhangrc
 
"University student"
May 2021
Beijing, China

269 Posts
Default Why does proof initialization take so long?

On Google Colab, Tesla T4:
Code:
rm: cannot remove 'memlock-0': No such file or directory
20220612 13:20:23 GpuOwl VERSION v7.2-91-g9c22195
20220612 13:20:23 GpuOwl VERSION v7.2-91-g9c22195
20220612 13:20:23 config: -maxAlloc 9.6G -log 40000 -B1 800000 
20220612 13:20:23 device 0, unique id ''
20220612 13:20:24 Tesla T4-0 113032481 FFT: 6M 1K:12:256 (17.97 bpw)
20220612 13:20:24 Tesla T4-0 113032481 OpenCL args "-DEXP=113032481u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DWEIGHT_STEP=0.023826314173892589 -DIWEIGHT_STEP=-0.023271832188761062 -DIWEIGHTS={0,-0.046002086204100269,-0.089887980473071061,-0.13175503205073663,-0.17169611191261491,-0.20979981877560222,-0.24615067563078263,-0.28082931723531812,} -DFWEIGHTS={0,0.048220321594898211,0.098765842604511836,0.15174868489239071,0.20728637687440288,0.26550211422442604,0.32652503315135151,0.39049049685359266,}  -cl-std=CL2.0 -cl-finite-math-only "
20220612 13:20:26 Tesla T4-0 113032481 

20220612 13:20:26 Tesla T4-0 113032481 OpenCL compilation in 1.77 s
20220612 13:20:26 Tesla T4-0 113032481 maxAlloc: 9.6 GB
20220612 13:20:27 Tesla T4-0 113032481 P1(0) 0 bits
20220612 13:20:30 Tesla T4-0 113032481 OK  95094000 on-load: blockSize 400, 05916431a052c7c0
20220612 13:20:30 Tesla T4-0 113032481 validating proof residues for power 8
20220612 13:22:54 Tesla T4-0 113032481 Proof using power 8
20220612 13:23:02 Tesla T4-0 113032481 OK  95094800  84.13% edcfbaa6de2347ec 6171 us/it + check 2.53s + save 0.23s; ETA 1d 06:45 1 errors
20220612 13:23:33 Tesla T4-0 113032481  95100000 d19f15dc27b170f7 6121
Took 144 seconds.
I'm using free account, so there's only about 3 hours of usage every day. One minute have to be spent on environment configuration, another 2.5 minutes for proof initialization, which both waste a lot of time.
AFAIK, Prime95 does not spend this much time on proof initialization, it usually starts in less than a second.
Why does proof initialization take so long? could it be avoided?
Anyway, I will have submitted this exponent by the end of this month, when I'll make an upgrade and see if the problem resolves itself. But anyhow it's still really strange.

Last fiddled with by Zhangrc on 2022-06-12 at 13:37
Zhangrc is offline   Reply With Quote
Old 2022-06-12, 14:55   #2767
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×3×37 Posts
Default

Here's what I think is happening. Power 8 means 256 residues will be stored for proof generation. 84% means ~215 have already been stored.
Validation means they are being read and checked. Their MD5 values get checked IIRC. That occurs with one CPU thread.
With local GPU and CPU, I have observed low GPU utilization during that period. See https://mersenneforum.org/showpost.p...postcount=2761

Validation duration increases on the same assignment and hardware as the prp residue collection progresses. The more residues to check, the longer it would take, about linear with number to check.
Checking ~23 to check takes ~22 seconds 15:10:01 -:23, ~0.96 second each stored residue (+-~0.1)
Code:
2022-05-14 15:10:01 asr2-radeonvii3 validating proof residues for power 10 
2022-05-14 15:10:23 asr2-radeonvii3 Proof using power 10 (vs 11) for 405001661 
2022-05-14 15:12:00 asr2-radeonvii3 405001661 OK  9320000   2.30%; 4812 us/it; ETA 22d 00:57; f5d52a30056294db (check 49.38s)
~679 to check takes longer, 07:46:13 to 7:58:34, 12:23 ~743 seconds, or ~1.09 second each:
Code:
2022-06-12 07:46:13 asr2-radeonvii3 validating proof residues for power 10
2022-06-12 07:58:34 asr2-radeonvii3 Proof using power 10 (vs 11) for 405001661
2022-06-12 08:00:52 asr2-radeonvii3 405001661 OK 268620000  66.32%; 4561 us/it; ETA 7d 04:49; 622676663e5aed30 (check 46.92s)
Note this is a power 10 run, on Radeon VII GPU, Windows10, Celeron G1840 cpu system.
I've had multiple proof generation failures on prime95, after lengthy runs. (Not the same as the partial proof uploads issue; typically there is an MD5 check failure at local proof generation time on the client.) I don't recall proof generation failures on gpuowl.

A T4 GPU is fast in TF and particularly slow in DP among Colab GPUs. Gpuowl would run faster on a K80 (or almost any other Colab model except a P4) https://www.mersenneforum.org/showpo...5&postcount=15

Last fiddled with by kriesel on 2022-06-12 at 15:27
kriesel is offline   Reply With Quote
Old 2022-06-13, 11:56   #2768
Zhangrc
 
"University student"
May 2021
Beijing, China

269 Posts
Default

I don't think that's necessary. Could I disable it by command settings?
I mean, I only intent to disable validation, not the proof itself.
If it's impossible, I might be forced to use a smaller proof power. It's about 100 minutes or more, just validating proof files, and the CPU cores on Colab is really slow to handle it.
@kriesel: If you're correct, that indicates the proof validation time is almost proportional to 4^ProofPower, which makes it reasonable to decrease proof power.
Also, I see about the same performance for T4 and K80. Maybe there was some commits in Gpuowl that has drawn performance by some 30%, but I couldn't remember which thread has discussed this issue.

Last fiddled with by Zhangrc on 2022-06-13 at 12:09
Zhangrc is offline   Reply With Quote
Old 2022-06-13, 14:49   #2769
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11011110000002 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
Could I disable it by command settings?
... the proof validation time is almost proportional to 4^ProofPower, which makes it reasonable to decrease proof power.
Also, I see about the same performance for T4 and K80.
I see no option in the help.txt to skip proof residue validation on startup, so you would need to edit the source code & compile.
I think the proof validation time is ~ k 2ProofPower where k is dependent on total number of restarts and their distribution during the run, and CPU or ram speed. You're right DP performance is close between T4 (59 GFLOPS) and K80 (115/2=57.5 GFLOPS/GPU). On Colab VMs one gets half a K80 card = 1 of the 2 GPUs on a K80 card. I had briefly forgotten about that 2-GPU factor on K80.

Last fiddled with by kriesel on 2022-06-13 at 14:51
kriesel is offline   Reply With Quote
Old 2022-06-15, 11:00   #2770
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

1078 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
Also, I see about the same performance for T4 and K80.
That is not what I observed recently when benchmarking the Colab GPUs with M57885161 for @James Heinrich (speeds are in us/iter):

GpuOwl master branch
  • A100 - 220
  • Tesla V100 - 348
  • Tesla P100 - 565
  • Tesla K80 - 2273
  • Tesla T4 - 3066
  • Tesla P4 - 4229
GpuOwl v6 branch
  • A100 - 204
  • Tesla V100 - 335
  • Tesla P100 - 561
  • Tesla K80 - 2139
  • Tesla T4 - 3082
  • Tesla P4 - 4220
CUDALucas
  • A100 - 257.8
  • Tesla V100 - 572.8
  • Tesla P100 - 891.2
  • Tesla K80 - 3058.2
  • Tesla T4 - 4031.2
  • Tesla P4 - 5107.5
As you can see, the K80 GPU is 25-30% faster than the T4.

Quote:
Originally Posted by Zhangrc View Post
Maybe there was some commits in Gpuowl that has drawn performance by some 30%, but I couldn't remember which thread has discussed this issue.
There was a severe performance regression that caused an over 100% performance degradation on many Nvidia GPUs. @preda fixed that in v7.2-94-g3c23546, although performance on these GPUs has still slowly degraded up to 15% across all FFT lengths. See here for more information and several graphs. The original thread is here: https://www.mersenneforum.org/showth...340#post605340
tdulcet is offline   Reply With Quote
Old 2022-06-19, 17:13   #2771
Runtime Error
 
Sep 2017
USA

3·61 Posts
Default RX 6950 XT

Quote:
Originally Posted by Runtime Error View Post
Would someone please educate me on tuning gpuowl? [...] Any help would be greatly appreciated! Thank you
I've tinkered with this a bit (strictly copy-pasting from others' forum posts), and the following seem to give the best performance on wavefront (112M) exponents:

Code:
-use NEW_FFT5,NEW_FFT8,TRIG_COMPUTE=0,UNROLL_WIDTH=1,OUT_WG=128,OUT_SIZEX=16,OUT_SPACING=2,IN_WG=128,IN_SIZEX=16,IN_SPACING=1
I do not understand the memory options. Is there a way to generate a set of possible valid timings to try?

Quote:
Originally Posted by Runtime Error View Post
when I've tried using the same commands that DrDerpenberg used in post #2732, it eventually throws an EE instead an OK.
I suspect the FFT length (-fft 512:8:512) specified in that post might be too aggressive for 77936867. Adding that FFT choice is the only way I have seen EE thus far.

Quote:
Originally Posted by preda View Post
If using gpuowl 7.x, also try with
-use TRIG_COMPUTE=0
or 1, or 2 (2 being the default).
The idea being that 2 does more compute, 0 does more memory accesses, and 1 is intermediary between those two.
Indeed! TRIG_COMPUTE=0 produces the fastest iteration times on this GPU for wavefront exponents. The improvement is substantial: about 40 ms/iter or 5% faster for wavefront PRP iterations. The in-between =1 option is also an improvement, but =0 is best.

Quote:
Originally Posted by Prime95 View Post
Use the -time command line argument.
Using this command reduces iteration time on wavefront PRP by ~5 ms! Is that time then "lost" producing the timing results? How might I interpret these results and use them to tune?
Runtime Error is offline   Reply With Quote
Old 2022-06-24, 14:45   #2772
Zhangrc
 
"University student"
May 2021
Beijing, China

1000011012 Posts
Default Why does GPUowl do some iterations beyond the exponent?

Code:
20220624 14:38:50 Tesla T4-0 113032481 113030000 81f68225_______ 6390
20220624 14:39:06 Tesla T4-0 113032481 CC 113032481 / 113032481, 6ccf666_________
20220624 14:39:11 Tesla T4-0 113032481 OK 113032800 100.00% 6c317__________ 6556 us/it + check 2.68s + save 0.00s; ETA 00:00 1 errors
AFAIK, we only test 3^2^p mod (2^p-1), so we need at most p iterations to get the final residue.
So why does GPUowl do some iterations beyond the exponent?
This is the first time I completed an exponent with GPUowl.

Last fiddled with by Zhangrc on 2022-06-24 at 15:00 Reason: res64 redacted
Zhangrc is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1690 2022-11-15 02:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 13:42.


Thu Dec 1 13:42:41 UTC 2022 up 105 days, 11:11, 1 user, load averages: 1.16, 1.12, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔