mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2018-11-05, 07:18   #870
SELROC
 

11110101102 Posts
Default

Quote:
Originally Posted by preda View Post
Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.

Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5).

I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64.

Thanks,
Mihai



Yesterday I have done only some quick tests. Apparently Ken has a directory with all the versions of gpuowl, I kept only v3.5 and Head.


I am retrying today afternoon with a new series of tests.
  Reply With Quote
Old 2018-11-05, 13:36   #871
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

543710 Posts
Default Link to fft lengths list

I've posted the v5.0-9c13870 fft list output, along with notes about earlier versions' supported fft lengths, at https://www.mersenneforum.org/showpo...36&postcount=9
kriesel is offline   Reply With Quote
Old 2018-11-05, 20:33   #872
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

153D16 Posts
Default makefile request

Preda, in addition to
Code:
openowl: ${HEADERS} ${SRCS}
    g++-8 -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH}
please add
Code:
openowl-win: ${HEADERS} ${SRCS}
    g++ -std=c++17 -O2 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static
openowl-win-nogit: ${HEADERS} ${SRCS}
    g++ -std=c++17 -O2 -DREV=\"\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static
to your standard V5.0 makefile. It would save some editing at every commit here.
kriesel is offline   Reply With Quote
Old 2018-11-05, 21:21   #873
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

10100100102 Posts
Default Another data point

I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.
tServo is offline   Reply With Quote
Old 2018-11-05, 21:39   #874
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by tServo View Post
I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.
Marv, I assume you're on Windows, thus not using ROCm?
preda is offline   Reply With Quote
Old 2018-11-05, 21:53   #875
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by tServo View Post
I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.
tServo,

What exponent or fft length did you run the comparison on?
If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful.

Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.

Last fiddled with by kriesel on 2018-11-05 at 21:58
kriesel is offline   Reply With Quote
Old 2018-11-06, 02:08   #876
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124758 Posts
Default possible Windows AMD driver issue affecting GPU-Z

After reporting the following issue to the authors of GPU-Z several times, for ~V2.7.0 through 2.14.0, without resolution, I have submitted it as an issue with the latest available AMD Adrenalin driver for Windows, v18.10.2. With Windows 7 x64 Pro, on a system with one or more RX480 or RX550 gpus installed, run GPU-Z during local console access. All parameters display ok. Switch to accessing that system via Windows Remote Desktop. Upon the switch to remote desktop, in all running sessions of GPU-Z, the GPU Core clock and GPU memory clock both drop to indicated values of zero; gpu temperature drops out to null degrees. Same system type (HP Z600, Windows 7 X64 Pro, same amount of memory etc) but NVIDIA gpus, no such issue. But it was also an issue with earlier AMD drivers.
kriesel is offline   Reply With Quote
Old 2018-11-06, 22:20   #877
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2×7×47 Posts
Default

Quote:
Originally Posted by kriesel View Post
tServo,

What exponent or fft length did you run the comparison on?
If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful.

Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.
Here are the requested data:

Windoze 10, 18.03 current to within a few months.
AMD Adrenaline driver 17.7 ( see below )
exponent tested is 87,3xxx,xxx
FFT size is 5120k
ms/sq is 4.52 ( for 3.8 it is 4.32 )

Note the ms/sq is 4.4% difference whereas yesterday I reported a 3.7 % difference.
The 3.7 was based on the ETA difference between the 2 versions.

The AMD driver is old, probably not updated since I got the machine.
I will update tomorrow and report the new times, if any.
I'm skeptical there will be much difference because my impression is that
both AMD & Nvidia pay lots of attention in their drivers to the performance of
the latest & greatest video games and perhaps BSOD complaints but not much else.
tServo is offline   Reply With Quote
Old 2018-11-07, 15:34   #878
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

12228 Posts
Default New AMD driver results

Installing the AMD driver 18.10 shows the two versions almost the same:

3.8 4.52 -> 4.54

5.0 4.52 -> 4.53

I will probably apply about a 5% overclock in a week and see how much that improves it.
However, RX 580s are notoriously difficult to overclock.
If the overclock jacks the power consumption too much,
I will back it off because the extra cost for power won't justify
a small increase in speed.
tServo is offline   Reply With Quote
Old 2018-11-09, 20:43   #879
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default PRP 89m completed on Win7 x64, gpuowl V5.0-9c13870

https://www.mersenne.org/report_expo...9000167&full=1 with the Adrenalin 18.10.2 driver. (Base=3 indicates no P-1 in the PRP run.)
kriesel is offline   Reply With Quote
Old 2018-11-21, 09:19   #880
SELROC
 

32·5·191 Posts
Default

Quote:
Originally Posted by preda View Post
I just added an FFT-3 "middle" step.



Here is a Debianized version of gpuowl 5.0


https://drive.google.com/file/d/1MvW...ew?usp=sharing


to install issue the command:
Code:
dpkg -i gpuowl.deb
  Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 06:56.


Fri Aug 6 06:56:29 UTC 2021 up 14 days, 1:25, 1 user, load averages: 2.56, 2.63, 2.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.