![]() |
[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.
Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5). I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64. Thanks, Mihai[/QUOTE] Yesterday I have done only some quick tests. Apparently Ken has a directory with all the versions of gpuowl, I kept only v3.5 and Head. I am retrying today afternoon with a new series of tests. |
Link to fft lengths list
I've posted the v5.0-9c13870 fft list output, along with notes about earlier versions' supported fft lengths, at [url]https://www.mersenneforum.org/showpost.php?p=499636&postcount=9[/url]
|
makefile request
Preda, in addition to [CODE]openowl: ${HEADERS} ${SRCS}
g++-8 -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} [/CODE]please add[CODE]openowl-win: ${HEADERS} ${SRCS} g++ -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static openowl-win-nogit: ${HEADERS} ${SRCS} g++ -std=c++17 -O2 -DREV=\"\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static [/CODE]to your standard V5.0 makefile. It would save some editing at every commit here. |
Another data point
I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate. |
[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE] Marv, I assume you're on Windows, thus not using ROCm? |
[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE] tServo, What exponent or fft length did you run the comparison on? If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful. Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed. |
possible Windows AMD driver issue affecting GPU-Z
After reporting the following issue to the authors of GPU-Z several times, for ~V2.7.0 through 2.14.0, without resolution, I have submitted it as an issue with the latest available AMD Adrenalin driver for Windows, v18.10.2. With Windows 7 x64 Pro, on a system with one or more RX480 or RX550 gpus installed, run GPU-Z during local console access. All parameters display ok. Switch to accessing that system via Windows Remote Desktop. Upon the switch to remote desktop, in all running sessions of GPU-Z, the GPU Core clock and GPU memory clock both drop to indicated values of zero; gpu temperature drops out to null degrees. Same system type (HP Z600, Windows 7 X64 Pro, same amount of memory etc) but NVIDIA gpus, no such issue. But it was also an issue with earlier AMD drivers.
|
[QUOTE=kriesel;499674]tServo,
What exponent or fft length did you run the comparison on? If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful. Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.[/QUOTE] Here are the requested data: Windoze 10, 18.03 current to within a few months. AMD Adrenaline driver 17.7 ( see below ) exponent tested is 87,3xxx,xxx FFT size is 5120k ms/sq is 4.52 ( for 3.8 it is 4.32 ) Note the ms/sq is 4.4% difference whereas yesterday I reported a 3.7 % difference. The 3.7 was based on the ETA difference between the 2 versions. The AMD driver is old, probably not updated since I got the machine. I will update tomorrow and report the new times, if any. I'm skeptical there will be much difference because my impression is that both AMD & Nvidia pay lots of attention in their drivers to the performance of the latest & greatest video games and perhaps BSOD complaints but not much else. |
New AMD driver results
Installing the AMD driver 18.10 shows the two versions almost the same:
3.8 4.52 -> 4.54 5.0 4.52 -> 4.53 I will probably apply about a 5% overclock in a week and see how much that improves it. However, RX 580s are notoriously difficult to overclock. If the overclock jacks the power consumption too much, I will back it off because the extra cost for power won't justify a small increase in speed. |
PRP 89m completed on Win7 x64, gpuowl V5.0-9c13870
[url]https://www.mersenne.org/report_exponent/?exp_lo=89000167&full=1[/url] with the Adrenalin 18.10.2 driver. (Base=3 indicates no P-1 in the PRP run.)
|
[QUOTE=preda;499580]I just added an FFT-3 "middle" step.[/QUOTE]
Here is a Debianized version of gpuowl 5.0 [url]https://drive.google.com/file/d/1MvWBK5ArXDcnEqCDjpa8nDgLJIhCnzWr/view?usp=sharing[/url] to install issue the command: [CODE]dpkg -i gpuowl.deb [/CODE] |
| All times are UTC. The time now is 23:10. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.