![]() |
[QUOTE=kriesel;535448]Commit changes seem to be rocm-specific.[/QUOTE]
Yes as I said, I tested (i.e. measured) with ROCm 2.10. Should not regress on other platforms, but I'm looking for feedback on this. If a regression is detected (e.g. on Nvidia) I'll switch the change on/off as appropriate. |
[QUOTE=preda;535450]Yes as I said, I tested (i.e. measured) with ROCm 2.10. Should not regress on other platforms, but I'm looking for feedback on this. If a regression is detected (e.g. on Nvidia) I'll switch the change on/off as appropriate.[/QUOTE]Coded for and test on are two separate distinctions. Thanks for all you do.
This is timely, as I was just considering rolling through a slew of gpu models with gpuowl minor updates and -use options timing script updates on PRP. |
gpuowl-v6.11-134-g1e0ce1d Windows x64 build
2 Attachment(s)
Only ran -h so far, but here it is. This is the latest commit at the moment, that has Preda's P-1 stage 2 tweak.
|
RX550 -use testing
[CODE]gpuowl v6.11-134
RX550 4GB Win7 x64 exponent 92400689 PRP 5M fft -iters 10000 -time NO_ASM 14491 NO_ASM,UNROLL_ALL 14492 NO_ASM,UNROLL_NONE 14364 NO_ASM,UNROLL_WIDTH 14363 NO_ASM,UNROLL_HEIGHT 14360 * NO_ASM,UNROLL_MIDDLEMUL1 14412 NO_ASM,UNROLL_MIDDLEMUL2 14363 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 14369 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 14363 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14361 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 14362 NO_ASM,MERGED_MIDDLE,WORKINGIN 19729 NO_ASM,MERGED_MIDDLE,WORKINGIN 19730 NO_ASM,MERGED_MIDDLE,WORKINGIN1 14683 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 14573 NO_ASM,MERGED_MIDDLE,WORKINGIN2 14849 NO_ASM,MERGED_MIDDLE,WORKINGIN3 15175 NO_ASM,MERGED_MIDDLE,WORKINGIN4 19404 NO_ASM,MERGED_MIDDLE,WORKINGIN5 14487 * NO_ASM,MERGED_MIDDLE,WORKINGOUT 32143 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 17920 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 14866 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 14825 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 14395 * NO_ASM,MERGED_MIDDLE,WORKINGOUT3 14496 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 15450 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 15736 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 14554 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 14319 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 14364 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 14394 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 14483 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 14309 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 18362 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32 14326 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE,CARRY64 14965 %allotheroptions%,FANCY_MIDDLEMUL1 14320 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 14398 %allotheroptions%,CHEBYSHEV_METHOD EE on load %allotheroptions%,CHEBYSHEV_METHOD_FMA EE on load %allotheroptions%,ORIGINAL_METHOD 14318 * %allotheroptions%,ORIGINAL_TWEAKED 14321 %allotheroptions%,ORIG_MIDDLEMUL2 14315 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 14309 * %allotheroptions%,ORIG_SLOWTRIG 14772 %allotheroptions%,NEW_SLOWTRIG 14306 %allotheroptions%,MORE_ACCURATE 14309 %allotheroptions%,LESS_ACCURATE 14184 * NO_ASM,UNROLL_HEIGHT,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32,ORIGINAL_METHOD,LESS_ACCURATE 14152 *[/CODE]14491/14152= 1.024 |
ending in 2
I have just downloaded a bunch of world record PRPs and some end in 0 and others end in 2. Will "program":{"name":"gpuowl", "version":"v6.11-124-g267cc60"} perform P-1 automatically on those ending with "2" or do I need to upgrade gpuOwl?
|
gpuowl v6.11-134-g1e0ce1d RX480 -use option timings
[CODE]gpuowl v6.11-134-g1e0ce1d
RX480 8GB Win7 x64 exponent 92162731 PRP 5M fft -iters 10000 -time NO_ASM 3372, 3374 NO_ASM,UNROLL_ALL 3375 NO_ASM,UNROLL_NONE 3349 NO_ASM,UNROLL_WIDTH 3351 NO_ASM,UNROLL_HEIGHT 3344 * NO_ASM,UNROLL_MIDDLEMUL1 3352 NO_ASM,UNROLL_MIDDLEMUL2 3373 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 3337 * NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 3374 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3365 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 3370 NO_ASM,MERGED_MIDDLE,WORKINGIN 5991 NO_ASM,MERGED_MIDDLE,WORKINGIN 6011 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3397 * NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3426 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3478 NO_ASM,MERGED_MIDDLE,WORKINGIN3 3473 NO_ASM,MERGED_MIDDLE,WORKINGIN4 3821 NO_ASM,MERGED_MIDDLE,WORKINGIN5 3365 NO_ASM,MERGED_MIDDLE,WORKINGOUT 5835 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4543 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3352 * NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3384 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3739 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 3365 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3468 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 3427 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 3383 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 3394 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 3390 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 3395 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 3436 set allotheroptions=NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1 %allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 3341 * %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 3353 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 3351 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT %allotheroptions%,CARRY32 3356 * %allotheroptions%,CARRY64 3479 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 3349 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 3341 * %allotheroptions%,CHEBYSHEV_METHOD EE on load %allotheroptions%,CHEBYSHEV_METHOD_FMA 3350 %allotheroptions%,ORIGINAL_METHOD 3356 %allotheroptions%,ORIGINAL_TWEAKED 3348 %allotheroptions%,ORIG_MIDDLEMUL2 3434 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 3357 * %allotheroptions%,ORIG_SLOWTRIG EE %allotheroptions%,NEW_SLOWTRIG 3360 * %allotheroptions%,MORE_ACCURATE 3362 %allotheroptions%,LESS_ACCURATE EE NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG[/CODE] |
[QUOTE=paulunderwood;535614]I have just downloaded a bunch of world record PRPs and some end in 0 and others end in 2. Will "program":{"name":"gpuowl", "version":"v6.11-124-g267cc60"} perform P-1 automatically on those ending with "2" or do I need to upgrade gpuOwl?[/QUOTE]
Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2. |
[QUOTE=PhilF;535616]Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.[/QUOTE]
Got it! :tu: |
[QUOTE=PhilF;535616]Yes, that version is after the automatic P-1 commit, so you should be fine. However, the latest commit from 2 days ago implements a change that results in a 33% speed improvement in P-1 stage 2.[/QUOTE]
Correction, it's 33% speed-up of one kernel (tailFusedMulDelta) that was taking up 45% of stage-2 time before. So it's more like a 12% speed-up of the stage2 (I hope). |
gpuowl v6.11-134-g1e0ce1d -use options on GTX1080
[CODE]gpuowl v6.11-134-g1e0ce1d
GTX1080 8GB Win7 x64 exponent 91996859 PRP 5M fft -iters 10000 -time NO_ASM 4541, 4560 NO_ASM,UNROLL_ALL 4542 NO_ASM,UNROLL_NONE BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_MIDDLEMUL1 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 BUILD_PROGRAM_FAILURE NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 4554 NO_ASM,MERGED_MIDDLE,WORKINGIN 4590 NO_ASM,MERGED_MIDDLE,WORKINGIN 5006 NO_ASM,MERGED_MIDDLE,WORKINGIN1 4574 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 4666 NO_ASM,MERGED_MIDDLE,WORKINGIN2 4541 NO_ASM,MERGED_MIDDLE,WORKINGIN3 4548 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4539 * NO_ASM,MERGED_MIDDLE,WORKINGIN5 4594 NO_ASM,MERGED_MIDDLE,WORKINGOUT 4615 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4622 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 4587 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 4654 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 4614 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 4587 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 4555 * NO_ASM,MERGED_MIDDLE,WORKINGOUT5 4602 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 4533 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 4599 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 4646 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 4591 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 4605 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 4548 set allotheroptions=NO_ASM,UNROLL_ALL,WORKINGIN4,WORKINGOUT4 %allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 4537 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 4558 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE 4517 * set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_ALL,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE %allotheroptions%,CARRY32 4698 %allotheroptions%,CARRY64 4559 * set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 4531 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 4542 %allotheroptions%,CHEBYSHEV_METHOD 4492 %allotheroptions%,CHEBYSHEV_METHOD_FMA 4483 * %allotheroptions%,ORIGINAL_METHOD 4546 %allotheroptions%,ORIGINAL_TWEAKED 4579 %allotheroptions%,ORIG_MIDDLEMUL2 4518 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 4445 * %allotheroptions%,ORIG_SLOWTRIG 4552 %allotheroptions%,NEW_SLOWTRIG 4447 %allotheroptions%,MORE_ACCURATE 4438 %allotheroptions%,LESS_ACCURATE 4428 * -use NO_ASM,UNROLL_ALL,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_REVERSELINE,CARRY64,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE 4550/4428 =~ 1.0276[/CODE] |
gpuowl v6.11-134 -use options testing on Radeon VII
Substantial tuning gained about 3% above program defaults.[CODE]gpuowl v6.11-134-g1e0ce1d
Radeon VII 16GB at 1244Mhz gpu clock, 880Mhz memory clock Win10 x64 exponent 92561231 PRP 5M fft -iters 10000 -time NO_ASM 1017 NO_ASM 1014 NO_ASM,UNROLL_ALL 1015 NO_ASM,UNROLL_NONE 1001 NO_ASM,UNROLL_WIDTH 1002 NO_ASM,UNROLL_HEIGHT 1002 NO_ASM,UNROLL_MIDDLEMUL1 1013 NO_ASM,UNROLL_MIDDLEMUL2 989 * NO_ASM,MERGED_MIDDLE,WORKINGIN 1393 NO_ASM,MERGED_MIDDLE,WORKINGIN 1391 NO_ASM,MERGED_MIDDLE,WORKINGIN1 1035 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 1032 NO_ASM,MERGED_MIDDLE,WORKINGIN2 1038 NO_ASM,MERGED_MIDDLE,WORKINGIN3 1023 NO_ASM,MERGED_MIDDLE,WORKINGIN4 1081 NO_ASM,MERGED_MIDDLE,WORKINGIN5 1010 * NO_ASM,MERGED_MIDDLE,WORKINGOUT 1177 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 1133 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 1028 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 1058 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 1117 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 1011 * NO_ASM,MERGED_MIDDLE,WORKINGOUT4 1042 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 1026 set wkgin=WORKINGIN5 set wkgout=WORKINGOUT3 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT 1000 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 987 NO_ASM,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 989 NO_ASM,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL2 986 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 1017 NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 1022 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 1012 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 1011 * NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 1029 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN5,WORKINGOUT3 %allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT 1014 * %allotheroptions%,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_WIDTH 1016 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1018 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE 1020 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 1028 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT %allotheroptions%,CARRY32 989 * %allotheroptions%,CARRY64 1022 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 1011 %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 991 %allotheroptions%,CHEBYSHEV_METHOD 989 * %allotheroptions%,CHEBYSHEV_METHOD_FMA 989 * %allotheroptions%,ORIGINAL_METHOD 991 %allotheroptions%,ORIGINAL_TWEAKED 990 %allotheroptions%,ORIG_MIDDLEMUL2 987 * %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 988 %allotheroptions%,ORIG_SLOWTRIG 1022 %allotheroptions%,NEW_SLOWTRIG 988 %allotheroptions%,MORE_ACCURATE 988 %allotheroptions%,LESS_ACCURATE 986 * NO_ASM,UNROLL_MIDDLEMUL2,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT,CARRY32,CHEBYSHEV_METHOD,ORIG_MIDDLEMUL2,LESS_ACCURATE repeatability +-1.5/1015.5 = 0.148% base 1015.5 final 986 ratio 1015.5/986 = 1.030 timing overhead ~986/974-1 =~ .0123[/CODE] |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.