![]() |
[QUOTE=SELROC;499428]I have opened a new issue:
[url]https://github.com/preda/gpuowl/issues/17[/url][/QUOTE] Thank you! this is caused most likely by the very low bits-per-word of 1.44bits. I hesitate to simply remove that assert(), because it may be useful in catching other real errors. |
[QUOTE=preda;499435]Thank you! this is caused most likely by the very low bits-per-word of 1.44bits. I hesitate to simply remove that assert(), because it may be useful in catching other real errors.[/QUOTE]
This happens with the last commit on github. The previous v5.0 was fine in this regard. |
v5.0 perf regression investigation
[QUOTE=SELROC;499418]The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6.[/QUOTE]
Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest. Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5). I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64. Thanks, Mihai |
[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?[/QUOTE]
No, I absolutely don't object to that; I just wanted to understand what is the "advantage" of v4.x vs. head. There is a slight drawback of testing an older version: if you find a problem in 4.x and report it, I'm unlikely to look into it before it's confirmed it's still present in head. (i.e. maybe it's already fixed). |
[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.
Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5). I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64. Thanks, Mihai[/QUOTE] I try to do it this afternoon. For precision, I am now running ROCm 1.9.211 |
[QUOTE=SELROC;499436]This happens with the last commit on github. The previous v5.0 was fine in this regard.[/QUOTE]
Oh that's interesting... So before you could run this low exponent without errors?.. I wonder what changed affecting this. |
[QUOTE=preda;499440]Oh that's interesting... So before you could run this low exponent without errors?.. I wonder what changed affecting this.[/QUOTE]
Sorry, I am rereading the logs and it was another exponent. I am talking about exponent 859433 prime. To be sure that I am making no confusion, I will restart a series of tests this afternoon and test both exponents, 859433 and 756839 with both versions. see you later. |
1 Attachment(s)
[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.
Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).[/QUOTE] See the attachment for driver info. (Pay no attention to the 0 Mhz gpu clock, which is a known issue with gpu-z since 2.8 or earlier and still present in the current 2.14 in combination with Windows 7 x64 Pro, Windows remote desktop use, and amd gpus, at least RX480 or RX550) I've been running that version driver since it was found to be required for gpuOwL [B]v2.0[/B], per [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL], so it was used for all V3.3 and up timings or production, as well as some V1.9 runs. |
New smaller FFT sizes added, to support even SMALLER exponents!
Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable. In the future I'll probably add a check to enforce this lower limit. [QUOTE=SELROC;499441] To be sure that I am making no confusion, I will restart a series of tests this afternoon and test both exponents, 859433 and 756839 with both versions. [/QUOTE] |
[QUOTE=preda;499449]New smaller FFT sizes added, to support even SMALLER exponents!
Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable. In the future I'll probably add a check to enforce this lower limit.[/QUOTE] Version #? readme has not been updated in 5 months. Documentation matters. |
2 Attachment(s)
[QUOTE=preda;499449]New smaller FFT sizes added, to support even SMALLER exponents!
Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable. In the future I'll probably add a check to enforce this lower limit.[/QUOTE] The quick tests on prime 859433. |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.