mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-11-03 09:41

[QUOTE=SELROC;499428]I have opened a new issue:


[url]https://github.com/preda/gpuowl/issues/17[/url][/QUOTE]

Thank you! this is caused most likely by the very low bits-per-word of 1.44bits. I hesitate to simply remove that assert(), because it may be useful in catching other real errors.

SELROC 2018-11-03 09:44

[QUOTE=preda;499435]Thank you! this is caused most likely by the very low bits-per-word of 1.44bits. I hesitate to simply remove that assert(), because it may be useful in catching other real errors.[/QUOTE]


This happens with the last commit on github. The previous v5.0 was fine in this regard.

preda 2018-11-03 10:00

v5.0 perf regression investigation
 
[QUOTE=SELROC;499418]The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6.[/QUOTE]

Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.

Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).

I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64.

Thanks,
Mihai

preda 2018-11-03 10:08

[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?[/QUOTE]

No, I absolutely don't object to that; I just wanted to understand what is the "advantage" of v4.x vs. head.

There is a slight drawback of testing an older version: if you find a problem in 4.x and report it, I'm unlikely to look into it before it's confirmed it's still present in head. (i.e. maybe it's already fixed).

SELROC 2018-11-03 10:10

[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.

Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).

I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64.

Thanks,
Mihai[/QUOTE]


I try to do it this afternoon.
For precision, I am now running ROCm 1.9.211

preda 2018-11-03 10:13

[QUOTE=SELROC;499436]This happens with the last commit on github. The previous v5.0 was fine in this regard.[/QUOTE]

Oh that's interesting... So before you could run this low exponent without errors?.. I wonder what changed affecting this.

SELROC 2018-11-03 10:24

[QUOTE=preda;499440]Oh that's interesting... So before you could run this low exponent without errors?.. I wonder what changed affecting this.[/QUOTE]


Sorry, I am rereading the logs and it was another exponent.
I am talking about exponent 859433 prime.


To be sure that I am making no confusion, I will restart a series of tests this afternoon and test both exponents, 859433 and 756839 with both versions.


see you later.

kriesel 2018-11-03 12:06

1 Attachment(s)
[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.

Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).[/QUOTE]
See the attachment for driver info. (Pay no attention to the 0 Mhz gpu clock, which is a known issue with gpu-z since 2.8 or earlier and still present in the current 2.14 in combination with Windows 7 x64 Pro, Windows remote desktop use, and amd gpus, at least RX480 or RX550)
I've been running that version driver since it was found to be required for gpuOwL [B]v2.0[/B], per [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL], so it was used for all V3.3 and up timings or production, as well as some V1.9 runs.

preda 2018-11-03 12:19

New smaller FFT sizes added, to support even SMALLER exponents!

Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable.

In the future I'll probably add a check to enforce this lower limit.

[QUOTE=SELROC;499441]
To be sure that I am making no confusion, I will restart a series of tests this afternoon and test both exponents, 859433 and 756839 with both versions.
[/QUOTE]

kriesel 2018-11-03 12:53

[QUOTE=preda;499449]New smaller FFT sizes added, to support even SMALLER exponents!

Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable.

In the future I'll probably add a check to enforce this lower limit.[/QUOTE]
Version #? readme has not been updated in 5 months. Documentation matters.

SELROC 2018-11-03 13:05

2 Attachment(s)
[QUOTE=preda;499449]New smaller FFT sizes added, to support even SMALLER exponents!

Anyway, I recommend you should not test with bits-per-word < 1.5. The behavior there is "undefined", i.e. if there are asserts or worse on bits-per-word < 1.5, that's acceptable.

In the future I'll probably add a check to enforce this lower limit.[/QUOTE]

The quick tests on prime 859433.


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.