mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-06-02, 23:44   #2256
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·19·29 Posts
Default

Quote:
Originally Posted by kriesel View Post
See the error about ambiguous overload
Code:
ProofSet.h:196:77: error: ambiguous overload for 'operator*' (operand types are '__gnu_cxx::__alloc_traits<std::allocator<__gmp_expr<__mpz_struct [1], __mpz_struct [1]> >, __gmp_expr<__mpz_struct [1], __mpz_struct [1]> >::value_type' {aka '__gmp_expr<__mpz_struct [1], __mpz_struct [1]>'} and 'std::array<long long unsigned int, 4>::value_type' {aka 'long long unsigned int'})
  196 |       for (int i = 0; i < (1 << (p - 1)); ++i) { hashes.push_back(hashes[i] * hash[0]); }
Attempted a fix. It's a bit in the dark, you should report at least the compiler version
g++ --version
preda is offline   Reply With Quote
Old 2020-06-03, 00:18   #2257
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·163 Posts
Default g++ version info etc from msys2 attempts

Quote:
Originally Posted by preda View Post
Attempted a fix. It's a bit in the dark, you should report at least the compiler version
g++ --version
Yeah it's some trick to try to write for a different environment you don't have. Feel free to build that into the make file. If I alter the makefile it will come back flagged dirty? I'm a little surprised to see version was not included in the build log, which is a capture of the console output when the makefile runs. Will try another build.
Code:
$ g++ --version
g++.exe (Rev2, Built by MSYS2 project) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Code:
$ g++ -v
Using built-in specs.
COLLECT_GCC=C:\msys64\mingw64\bin\g++.exe
COLLECT_LTO_WRAPPER=C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/9.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-9.2.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 --host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib --enable-bootstrap --with-arch=x86-64 --with-tune=generic --enable-languages=c,lto,c++,fortran,ada,objc,obj-c++ --enable-shared --enable-static --enable-libatomic --enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes --enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --disable-isl-version-check --enable-lto --enable-libgomp --disable-multilib --enable-checking=release --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --enable-plugin --with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 --with-isl=/mingw64 --with-pkgversion='Rev2, Built by MSYS2 project' --with-bugurl=https://sourceforge.net/projects/msys2 --with-gnu-as --with-gnu-ld
Thread model: posix
 gcc version 9.2.0 (Rev2, Built by MSYS2 project)
edit: v6.11-311-gfa76bd9 same problem.
Attached Files
File Type: txt build-log.txt (10.7 KB, 1 views)

Last fiddled with by kriesel on 2020-06-03 at 00:40 Reason: added build retry outcome
kriesel is offline   Reply With Quote
Old 2020-06-03, 00:36   #2258
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·19·29 Posts
Default ROCm 3.5

I tried ROCm 3.5, I see more than 5% performance hit vs. ROCm 3.3

I opened this issue: https://github.com/RadeonOpenCompute/ROCm/issues/1124

Feel free to +1 the issue if you think it prevents you from using ROCm 3.5. Also, if you do try ROCm 3.5, please add details on that issue. The timing can be easily obtained by running with "-time" command line argument to see per-kernel timing info.

I personally moved back to 3.3 already.

I was under the impression, I heard from the ROCm team, that they use gpuowl in their internal regression tools. It's mistifying then to see this regression.

Last fiddled with by preda on 2020-06-03 at 00:59
preda is offline   Reply With Quote
Old 2020-06-03, 04:24   #2259
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

22·821 Posts
Default

I managed to get rocm-3.5.0 running after a short battle with Debian Buster, which involved an upgrade, getting some held back packages installed, an "autoclean", another upgrade, a dist-upgrade, a dpkg -i --force-all, a reboot, linking the libopencl shared object, recompiling gpuowl.

2 instances @5.5M FFT and sclk at 3, I think it is slower at 1517 µs/it, but overclocking the RAM to 1200 it is 1423 µs/it.

I just checked. A SLOW DOWN from 1440 µs/it to 1517 ųs/it

Edit: Change the -L option to 3.5.0 in Makefile and timings went from 1423 µs/it to 1418 µs/it. That helped!

Another edit: With sclk at 4 I am now getting 1316 µs/it.

Last fiddled with by paulunderwood on 2020-06-03 at 11:40
paulunderwood is offline   Reply With Quote
Old 2020-06-03, 17:39   #2260
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52·163 Posts
Default

Quote:
Originally Posted by preda View Post
ROCm exposes a per-GPU unique_id, e.g.:

Code:
cat /sys/class/drm/card0/device/unique_id 
3044212172dc768c
This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID.

I added a way to specify the GPU to run on by using this unique id:
./gpuowl -uid 3044212172dc768c

this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots.

Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU.

I also added a few small python scripts (ROCm) under the tools/ directory in the source code:
- monitor.py : prints general information about all the ROCm GPUs found
- device.py : given a UID, prints the device serial id

The last script, device.py, can be used in user power-play scripts that set parameters of GPUs (e.g. memory frequency, undervolting, fan etc), to identify GPUs by UID instead of serial-id to achieve a correct GPU identification.
Is that so regardless of gpu model? I note Radeon VII gpus have a serial number built in. (Cpuid hwinfo produced this on Windows. RX480 and RX550 do not have such serial numbers.)
Attached Thumbnails
Click image for larger version

Name:	radeonvii asic serial no.png
Views:	13
Size:	34.6 KB
ID:	22505  
kriesel is offline   Reply With Quote
Old 2020-06-03, 20:03   #2261
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52×163 Posts
Default First ram overclock experiment

Upgraded to Windows 10. This allows GPU-Z to function correctly and display clock rates, which is badly broken in Windows 7 with remote desktop.
Then on to install AMD Radeon software to monitor and tweak things.
First try was automatic ram overclock, which went to 1200Mhz, clearly too high based on the following. There is an untapped opportunity for gpuowl to detect known-bad residues at each console output step.
Code:
2020-06-03 13:37:23 gpuowl v6.11-292-gecab9ae
2020-06-03 13:37:23 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM -maxAlloc 15000
2020-06-03 13:37:23 device 2, unique id ''

2020-06-03 13:37:23 asr2/radeonvii2 160708577 FFT: 9M 1K:9:512 (17.03 bpw)
2020-06-03 13:37:23 asr2/radeonvii2 Expected maximum carry32: 2F4C0000
2020-06-03 13:37:25 asr2/radeonvii2 OpenCL args "-DEXP=160708577u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0xf.adab9b1c15ad8p-3 -DIWEIGHT_STEP=0x8.2a025c1f5ebcp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1  -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-06-03 13:37:35 asr2/radeonvii2 OpenCL compilation in 10.37 s
2020-06-03 13:37:36 asr2/radeonvii2 160708577 LL 138500000 loaded: 77e2fa0b44bfb8f1
2020-06-03 13:39:57 asr2/radeonvii2 160708577 LL 138600000  86.24%; 1406 us/it; ETA 0d 08:38; dcde0783ad061fc5
2020-06-03 13:42:10 asr2/radeonvii2 160708577 LL 138700000  86.31%; 1336 us/it; ETA 0d 08:10; 0000000000000002
2020-06-03 13:44:22 asr2/radeonvii2 160708577 LL 138800000  86.37%; 1315 us/it; ETA 0d 08:00; 0000000000000002
2020-06-03 13:46:33 asr2/radeonvii2 160708577 LL 138900000  86.43%; 1315 us/it; ETA 0d 07:58; 0000000000000002
2020-06-03 13:48:50 asr2/radeonvii2 160708577 LL 139000000  86.49%; 1369 us/it; ETA 0d 08:15; 0000000000000002
2020-06-03 13:49:29 asr2/radeonvii2 Stopping, please wait..
2020-06-03 13:49:30 asr2/radeonvii2 160708577 LL 139028000  86.51%; 1410 us/it; ETA 0d 08:29; 0000000000000002
2020-06-03 13:49:30 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-06-03 13:49:30 asr2/radeonvii2 160708577 EE 139000000 (jacobi == 0)
2020-06-03 13:49:30 asr2/radeonvii2 Exiting because "stop requested"
2020-06-03 13:49:30 asr2/radeonvii2 Bye
Manual ram clock tuning while watching Jacobi checks and res64 seems to indicate that up to 1150Mhz is ok.

The mappings of devices are strange and inconsistent. In gpuowl, it's d0, d1, d2. In AMD radeon software, it's gpu1, gpu2, gpu3, and gpu1 got me d2's memory clock changing. The one LL instance was what I was trying to avoid.
Per some online forums, the device numbering is not linear across a motherboard; x16 pcie slots go before x1 slots. GPU-Z list order does not match gpuowl list order either; AMD radeon software's gpu1 (first) is gpuowl's d2 (last) but gpu-z's second in the list of 3. And this is with all 3 running on adjacent x1 pcie slots!

Windows device manager may be yet different. And oddly, while the HD4600 is now -d 3 in the gpuowl list, under Windows 7 it was device 0.
kriesel is offline   Reply With Quote
Old 2020-06-04, 00:15   #2262
Xebecer
 
Jun 2019
Ipswich, MA

32 Posts
Default Questions about shared folder, and FFT length

Almighty Mathlords, I beseech thee:



I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions:


1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file.


2. How exactly does one determine the ideal FFT length?


For the purpose of this discussion, one may assume (but not conclude) that I am a structural engineer and an attorney, not a programmer, and haven't programmed anything since 1992.


Thanks
Xebecer is offline   Reply With Quote
Old 2020-06-04, 01:08   #2263
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101100001001012 Posts
Default

Quote:
Originally Posted by Xebecer View Post
2. How exactly does one determine the ideal FFT length?
See here for the random-walk-based heuristic I use in my Mlucas code ... that gives results which more or less - typically with ~1%, exponent-wise - match the internal tables used by other known-to-be-high-accuracy codes such as Prime95 and gpuOwl. Here is a small C function implementing same heuristic:
Code:
/*
For a given FFT length, estimate maximum exponent that can be tested.

This implements formula (8) in the F24 paper (Math Comp. 72 (243), pp.1555-1572,
December 2002) in order to estimate the maximum average wordsize for a given FFT length.
For roughly IEEE64-compliant arithmetic, an asymptotic constant of 0.6 (log2(C) in the
the paper, which recommends something around unity) seems to fit the observed data best.
*/
uint64 given_N_get_maxP(uint32 N)
{
	const double Bmant = 53;
	const double AsympConst = 0.6;
	const double ln2inv = 1.0/log(2.0);
	double ln_N, lnln_N, l2_N, lnl2_N, l2l2_N, lnlnln_N, l2lnln_N;
	double Wbits, maxExp2;

	ln_N     = log(1.0*N);
	lnln_N   = log(ln_N);
	l2_N     = ln2inv*ln_N;
	lnl2_N   = log(l2_N);
	l2l2_N   = ln2inv*lnl2_N;
	lnlnln_N = log(lnln_N);
	l2lnln_N = ln2inv*lnlnln_N;

	Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) );
	maxExp2 = Wbits*N;
/*
	fprintf(stderr,"N = %8u K  maxP = %10u\n", N>>10, (uint32)maxExp2);
*/
	return (uint64)maxExp2;
}
Alternatively, here a simple PARI-GP 'script' to return maxExp for any desired N [enter FFT length in #doubles]:
Code:
maxp(N, AsympConst) = { \
	Bmant = 53.; ln2inv = 1.0/log(2.0); \
	ln_N = log(1.0*N); lnln_N = log(ln_N); l2_N = ln2inv*ln_N; lnl2_N = log(l2_N); l2l2_N = ln2inv*lnl2_N; lnlnln_N = log(lnln_N); l2lnln_N = ln2inv*lnlnln_N; \
	Wbits = 0.5*( Bmant - AsympConst - 0.5*(l2_N + l2l2_N) - 1.5*(l2lnln_N) ); \
	return(Wbits*N); \
}
Ex: maxp(2240<<10, 0.4) = 43235170

And here a *nix bc function (invoke bc in floating-point mode, as 'bc -l'):
Code:
define maxp(bmant, n, asympconst) {
	auto ln2inv, ln_n, lnln_n, l2_n, lnl2_n, l2l2_n, lnlnln_n, l2lnln_n, wbits;
	ln2inv = 1.0/l(2.0);
	ln_n = l(1.0*n); lnln_n = l(ln_n); l2_n = ln2inv*ln_n; lnl2_n = l(l2_n); l2l2_n = ln2inv*lnl2_n; lnlnln_n = l(lnln_n); l2lnln_n = ln2inv*lnlnln_n;
	wbits = 0.5*( bmant - asympconst - 0.5*(l2_n + l2l2_n) - 1.5*(l2lnln_n) );
	return(wbits*n);
}
Ex: maxp(2240*2^10, 0.4) = 43235170.58240592323366420480

Have fun!

Last fiddled with by ewmayer on 2020-06-04 at 01:10
ewmayer is offline   Reply With Quote
Old 2020-06-04, 01:10   #2264
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

110210 Posts
Default

Quote:
Originally Posted by Xebecer View Post
Almighty Mathlords, I beseech thee:

I am running four RX 5700 XT GPUs, with a 2990WX, all watercooled. gpuOwl version 6.11 is running well. However, I have two questions:


1. Should one choose to use a shared worktodo/results directory ( -pool ), is there a particular format for the results.txt file? I get an error message that it can't read the file.
If you run multiple instances of gpuowl, -pool can be useful. I do use -pool myself for my runs. You need to create a directory that you pass to -pool (the directory must exist). You have to put work into pool/worktodo.txt (primenet.py can do that). Next start gpuowl with -pool <dir>

The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time.

Quote:
2. How exactly does one determine the ideal FFT length?
Thanks
Should be good by default. No need to tinker with FFT length unless it's not running (getting 3 errors in chain and exit).
preda is offline   Reply With Quote
Old 2020-06-05, 13:09   #2265
Xebecer
 
Jun 2019
Ipswich, MA

32 Posts
Default

Quote:
Originally Posted by preda View Post
The error message may indicate that the pool folder does not exit. To be sure, you can post the error message next time.


Created shared folder, put in a results.txt and worktodo.txt, used

-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:



Can't open 'C' (mode 'ab')
Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/]
Bye

Last fiddled with by paulunderwood on 2020-06-05 at 13:14 Reason: fixed quote
Xebecer is offline   Reply With Quote
Old 2020-06-05, 13:16   #2266
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1100110101002 Posts
Default

Quote:
Originally Posted by Xebecer View Post
-pool C:\Users\xebec\Desktop\GPUOwl Shared in the config.txt file. I get:



Can't open 'C' (mode 'ab')
Exception NSt10filesystem7__cxx1116filesystem_errorE" filesystem error" can't open file" No error [C"\Users\xebec\Desktop\GPUOwl_Shared/]
Bye
In the config you have "GPUOwl Shared" but the error has an underscore: "GPUOwl_Shared". Also an extraneous "/"

Last fiddled with by paulunderwood on 2020-06-05 at 13:22
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1618 2020-06-24 00:11
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 06:33.

Sun Jul 12 06:33:53 UTC 2020 up 109 days, 4:06, 0 users, load averages: 1.15, 1.14, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.