mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-11-05 07:18

[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.

Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).

I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64.

Thanks,
Mihai[/QUOTE]




Yesterday I have done only some quick tests. Apparently Ken has a directory with all the versions of gpuowl, I kept only v3.5 and Head.


I am retrying today afternoon with a new series of tests.

kriesel 2018-11-05 13:36

Link to fft lengths list
 
I've posted the v5.0-9c13870 fft list output, along with notes about earlier versions' supported fft lengths, at [url]https://www.mersenneforum.org/showpost.php?p=499636&postcount=9[/url]

kriesel 2018-11-05 20:33

makefile request
 
Preda, in addition to [CODE]openowl: ${HEADERS} ${SRCS}
g++-8 -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH}
[/CODE]please add[CODE]openowl-win: ${HEADERS} ${SRCS}
g++ -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static
openowl-win-nogit: ${HEADERS} ${SRCS}
g++ -std=c++17 -O2 -DREV=\"\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static
[/CODE]to your standard V5.0 makefile. It would save some editing at every commit here.

tServo 2018-11-05 21:21

Another data point
 
I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.

preda 2018-11-05 21:39

[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE]

Marv, I assume you're on Windows, thus not using ROCm?

kriesel 2018-11-05 21:53

[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE]
tServo,

What exponent or fft length did you run the comparison on?
If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful.

Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.

kriesel 2018-11-06 02:08

possible Windows AMD driver issue affecting GPU-Z
 
After reporting the following issue to the authors of GPU-Z several times, for ~V2.7.0 through 2.14.0, without resolution, I have submitted it as an issue with the latest available AMD Adrenalin driver for Windows, v18.10.2. With Windows 7 x64 Pro, on a system with one or more RX480 or RX550 gpus installed, run GPU-Z during local console access. All parameters display ok. Switch to accessing that system via Windows Remote Desktop. Upon the switch to remote desktop, in all running sessions of GPU-Z, the GPU Core clock and GPU memory clock both drop to indicated values of zero; gpu temperature drops out to null degrees. Same system type (HP Z600, Windows 7 X64 Pro, same amount of memory etc) but NVIDIA gpus, no such issue. But it was also an issue with earlier AMD drivers.

tServo 2018-11-06 22:20

[QUOTE=kriesel;499674]tServo,

What exponent or fft length did you run the comparison on?
If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful.

Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.[/QUOTE]

Here are the requested data:

Windoze 10, 18.03 current to within a few months.
AMD Adrenaline driver 17.7 ( see below )
exponent tested is 87,3xxx,xxx
FFT size is 5120k
ms/sq is 4.52 ( for 3.8 it is 4.32 )

Note the ms/sq is 4.4% difference whereas yesterday I reported a 3.7 % difference.
The 3.7 was based on the ETA difference between the 2 versions.

The AMD driver is old, probably not updated since I got the machine.
I will update tomorrow and report the new times, if any.
I'm skeptical there will be much difference because my impression is that
both AMD & Nvidia pay lots of attention in their drivers to the performance of
the latest & greatest video games and perhaps BSOD complaints but not much else.

tServo 2018-11-07 15:34

New AMD driver results
 
Installing the AMD driver 18.10 shows the two versions almost the same:

3.8 4.52 -> 4.54

5.0 4.52 -> 4.53

I will probably apply about a 5% overclock in a week and see how much that improves it.
However, RX 580s are notoriously difficult to overclock.
If the overclock jacks the power consumption too much,
I will back it off because the extra cost for power won't justify
a small increase in speed.

kriesel 2018-11-09 20:43

PRP 89m completed on Win7 x64, gpuowl V5.0-9c13870
 
[url]https://www.mersenne.org/report_exponent/?exp_lo=89000167&full=1[/url] with the Adrenalin 18.10.2 driver. (Base=3 indicates no P-1 in the PRP run.)

SELROC 2018-11-21 09:19

[QUOTE=preda;499580]I just added an FFT-3 "middle" step.[/QUOTE]




Here is a Debianized version of gpuowl 5.0


[url]https://drive.google.com/file/d/1MvWBK5ArXDcnEqCDjpa8nDgLJIhCnzWr/view?usp=sharing[/url]


to install issue the command:
[CODE]dpkg -i gpuowl.deb
[/CODE]


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.