mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

Xyzzy 2020-06-08 20:39

We are trying to compile in Centos 7.

We have python3 and gcc 7, 8 and 9 installed.

All versions of g++ fail here:[CODE]$ make

./tools/expand.py < gpuowl.cl > gpuowl-expanded.cl
cat head.txt gpuowl-expanded.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.11-318-g3109989-dirty"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp
GmpUtil.cpp: In function ‘std::string GCD(u32, const std::vector<unsigned int>&, u32)’:
GmpUtil.cpp:51:25: error: ‘gcd’ was not declared in this scope
mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub);
^~~
GmpUtil.cpp:51:25: note: suggested alternative: ‘gcvt’
mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub);
^~~
gcvt
make: *** [Makefile:30: GmpUtil.o] Error 1[/CODE][CODE]$ cat GmpUtil.cpp
// Copyright (C) Mihai Preda.

#include "GmpUtil.h"

#include <gmp.h>
#include <cmath>
#include <cassert>

using namespace std;

namespace {

mpz_class mpz(const vector<u32>& words) {
mpz_class b{};
mpz_import(b.get_mpz_t(), words.size(), -1 /*order: LSWord first*/, sizeof(u32), 0 /*endianess: native*/, 0 /*nails*/, words.data());
return b;
}

mpz_class primorial(u32 p) {
mpz_class b{};
mpz_primorial_ui(b.get_mpz_t(), p);
return b;
}

mpz_class powerSmooth(u32 exp, u32 B1) {
mpz_class a{exp};
a *= 256; // boost 2s.
for (int k = log2(B1); k >= 1; --k) { a *= primorial(pow(B1, 1.0 / k)); }
return a;
}

u32 sizeBits(mpz_class a) { return mpz_sizeinbase(a.get_mpz_t(), 2); }

}

vector<bool> bitsMSB(const mpz_class& a) {
vector<bool> bits;
int nBits = sizeBits(a);
bits.reserve(nBits);
for (int i = nBits - 1; i >= 0; --i) { bits.push_back(mpz_tstbit(a.get_mpz_t(), i)); }
assert(int(bits.size()) == nBits);
return bits;
}

// return GCD(bits - sub, 2^exp - 1) as a decimal string if GCD!=1, or empty string otherwise.
std::string GCD(u32 exp, const std::vector<u32>& words, u32 sub) {
mpz_class w = mpz(words);
if (w == 0 || w == sub) {
throw std::domain_error("GCD invalid input");
}
mpz_class resultGcd = gcd((mpz_class{1} << exp) - 1, w - sub);
return (resultGcd == 1) ? ""s : resultGcd.get_str();
}

// MSB: Most Significant Bit first (at index 0).
vector<bool> powerSmoothMSB(u32 exp, u32 B1) { return bitsMSB(powerSmooth(exp, B1)); }

int jacobi(u32 exp, const std::vector<u32>& words) {
mpz_class w = mpz(words) - 2;
mpz_class m = (mpz_class{1} << exp) - 1;
return mpz_jacobi(w.get_mpz_t(), m.get_mpz_t());
}
[/CODE]:help:

Edit: We learned how to checkout older versions of gpuowl. This happens in the other versions (last month or so) we tried as well.

ewmayer 2020-06-08 20:43

[QUOTE=kriesel;547460]I have a Radeon VII on Windows 10 that has decided after a hang, kill process, and launch new process, to run at 570Mhz, which is below the nominal minimum. It seems to be doing ok there, in an odd sort of ultra-power-saving mode. Indicated power consumption in GPU-Z is 61W on that gpu.[/QUOTE]

Do you have a similar rocm-smi cli under Windows as we Linuxers have? How do you manually fiddle the sclk setting under Windows? Because it sounds like your sclk setting simply got reset to a low level.

kriesel 2020-06-09 01:17

1 Attachment(s)
[QUOTE=ewmayer;547489]Do you have a similar rocm-smi cli under Windows as we Linuxers have? How do you manually fiddle the sclk setting under Windows? Because it sounds like your sclk setting simply got reset to a low level.[/QUOTE]Rocm is linux specific, so no. Windows seems to be the neglected stepchild there. [URL]https://github.com/RadeonOpenCompute/ROCm/issues/18[/URL]
(And the string "windo" is not present in [url]https://github.com/RadeonOpenCompute/ROCm;[/url] there's a list of linux supported only.) Compared to NVIDIA supporting nvidia-smi on linux and Windows, for pro gpus and gamers' and low end, it's a definite negative for AMD.

There is a graphical interactive tool that is available in the AMD Radeon software package that includes the Adrenalin 2020 driver 20.4.2.(GPU-Z identifies it as Driver version 26.20.15029.27017 (Adrenalin 20.4.2) DCH / Win10 64 May 15 2020) As far as I can tell, if I want to treat one gpu differently, and I do, I must tune each separately. It appears that takes different save files. Those are in xml, in which I'm pretty illiterate. There's no such thing as a sclk setting as far as I can tell. I can dial gpu or memory clocks up by individual percents over limited ranges, or switch modes and dial their limits by kilohertz increments. This tool says that slowww gpu should be running at 808 to ~1670 Mhz. GPU-Z and it reports it at 570, as does HWInfo, and the slower gpuowl timings on the same exponent confirm it. I think it will take a system restart or power cycle to clear it up. Windows Device Manager disable and reenable after carefully identifying it by PCI bus number did not clear it. That would even be slow for the HD4600 IGP, which shows at 600Mhz at idle.
There is no "570" string in the xml file (see code section below). There are min, mid, and max values in Mhz, all larger than 800. (I checked other files too.) I added the annotation in green font.

This interface difference is why when linux users talk sclk 2 through sclk5, it does not translate without a conversion table. And I see different posters give different Mhz values for the same sclk value in a thread or two here.

There's also something called Wattman which I've used, also graphical, but that is not installed on this system. I think it came with a much older driver package, different source.
[CODE]<?xml version="1.0" encoding="UTF-8"?>
<GPU DevID="66AF" RevID="C1">
<PPW Value="1"/>
<FEATURE ID="100" Enabled="0">
<STATES>
<STATE ID="0" Enabled="False" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="101" Enabled="3">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="4" Enabled="True">
<STATES>
<STATE ID="0" Enabled="False" Value="808"/> [COLOR=green]minimum gpu clock[/COLOR]
<STATE ID="1" Enabled="False" Value="1240"/> [COLOR=Green]midpoint adjusted from maximum being lowered[/COLOR]
<STATE ID="2" Enabled="False" Value="1672"/> [COLOR=green]lowered maximum[/COLOR]
<STATE ID="3" Enabled="False" Value="808"/>
<STATE ID="4" Enabled="False" Value="1672"/> [COLOR=green]lowered maximum[/COLOR]
</STATES>
</FEATURE>
<FEATURE ID="12" Enabled="False">
<STATES>
<STATE ID="0" Enabled="False" Value="712"/>
<STATE ID="1" Enabled="False" Value="797"/>
<STATE ID="2" Enabled="False" Value="1023"/>
</STATES>
</FEATURE>
<FEATURE ID="5" Enabled="True">
<STATES>
<STATE ID="0" Enabled="False" Value="1122"/>[COLOR=Green] vram clock[/COLOR]
</STATES>
</FEATURE>
<FEATURE ID="9" Enabled="False">
<STATES>
<STATE ID="0" Enabled="False" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="8" Enabled="False">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
<STATE ID="1" Enabled="True" Value="0"/>
<STATE ID="2" Enabled="True" Value="-14"/> [COLOR=green]power limit modified[/COLOR]
</STATES>
</FEATURE>
<FEATURE ID="17" Enabled="False">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="19" Enabled="False">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="20" Enabled="False">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="21" Enabled="False">
<STATES>
<STATE ID="0" Enabled="True" Value="0"/>
</STATES>
</FEATURE>
<FEATURE ID="22" Enabled="True">
<STATES>
<STATE ID="0" Enabled="False" Value="30"/>
<STATE ID="1" Enabled="False" Value="7"/>
<STATE ID="2" Enabled="False" Value="50"/>
<STATE ID="3" Enabled="False" Value="8"/>
<STATE ID="4" Enabled="False" Value="72"/>
<STATE ID="5" Enabled="False" Value="34"/>
<STATE ID="6" Enabled="False" Value="89"/>
<STATE ID="7" Enabled="False" Value="71"/>
<STATE ID="8" Enabled="False" Value="100"/>
<STATE ID="9" Enabled="False" Value="100"/>
</STATES>
</FEATURE>
</GPU>
[/CODE]

preda 2020-06-09 02:33

[QUOTE=Xyzzy;547488]We are trying to compile in Centos 7.
[/QUOTE]

Probably you need a newer version of GMP. Could you please find the file gmpxx.h and grep it for "gcd", e.g.
[CODE]
$ grep gcd /usr/include/gmpxx.h

struct __gmp_gcd_function
{ mpz_gcd(z, w, v); }
{ mpz_gcd_ui(z, w, l); }
{ __GMPXX_TMPZ_D; mpz_gcd (z, w, temp); }
__GMP_DEFINE_BINARY_FUNCTION(gcd, __gmp_gcd_function)
[/CODE]

Xyzzy 2020-06-09 02:47

[code]$ ls -l gmp*
-rw-r--r--. 1 root root 2289 Aug 2 2017 gmp.h
-rw-r--r--. 1 root root 2473 Aug 2 2017 gmp-mparam.h
-rw-r--r--. 1 root root 11524 Aug 2 2017 gmp-mparam-x86_64.h
-rw-r--r--. 1 root root 83249 Aug 2 2017 gmp-x86_64.h
-rw-r--r--. 1 root root 113143 Aug 2 2017 gmpxx.h[/code][code]$ grep gcd gmp*
gmp-x86_64.h:#define mpz_gcd __gmpz_gcd
gmp-x86_64.h:__GMP_DECLSPEC void mpz_gcd (mpz_ptr, mpz_srcptr, mpz_srcptr);
gmp-x86_64.h:#define mpz_gcd_ui __gmpz_gcd_ui
gmp-x86_64.h:__GMP_DECLSPEC unsigned long int mpz_gcd_ui (mpz_ptr, mpz_srcptr, unsigned long int);
gmp-x86_64.h:#define mpz_gcdext __gmpz_gcdext
gmp-x86_64.h:__GMP_DECLSPEC void mpz_gcdext (mpz_ptr, mpz_ptr, mpz_ptr, mpz_srcptr, mpz_srcptr);
gmp-x86_64.h:#define mpn_gcd __MPN(gcd)
gmp-x86_64.h:__GMP_DECLSPEC mp_size_t mpn_gcd (mp_ptr, mp_ptr, mp_size_t, mp_ptr, mp_size_t);
gmp-x86_64.h:#define mpn_gcd_1 __MPN(gcd_1)
gmp-x86_64.h:__GMP_DECLSPEC mp_limb_t mpn_gcd_1 (mp_srcptr, mp_size_t, mp_limb_t) __GMP_ATTRIBUTE_PURE;
gmp-x86_64.h:#define mpn_gcdext_1 __MPN(gcdext_1)
gmp-x86_64.h:__GMP_DECLSPEC mp_limb_t mpn_gcdext_1 (mp_limb_signed_t *, mp_limb_signed_t *, mp_limb_t, mp_limb_t);
gmp-x86_64.h:#define mpn_gcdext __MPN(gcdext)
gmp-x86_64.h:__GMP_DECLSPEC mp_size_t mpn_gcdext (mp_ptr, mp_ptr, mp_size_t *, mp_ptr, mp_size_t, mp_ptr, mp_size_t);[/code][code]$ yum info gmp-devel.x86_64
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.advancedhosters.com
* extras: mirror.steadfastnet.com
* updates: mirror.dal10.us.leaseweb.net
Installed Packages
Name : gmp-devel
Arch : x86_64
Epoch : 1
Version : 6.0.0
Release : 15.el7
Size : 340 k
Repo : installed
From repo : base
Summary : Development tools for the GNU MP arbitrary precision library
URL : http://gmplib.org/
License : LGPLv3+ or GPLv2+
Description : The libraries, header files and documentation for using the GNU MP
: arbitrary precision library in applications.
:
: If you want to develop applications which will use the GNU MP library,
: you'll need to install the gmp-devel package. You'll also need to
: install the gmp package.[/code]

preda 2020-06-09 03:36

[QUOTE=Xyzzy;547513]Version : 6.0.0
[/QUOTE]

Yes, can you please install GMP 6.1 or 6.2? It seems gcd() was added to the c++ wrapper after 6.0 (a hypothesis).

Xyzzy 2020-06-09 04:19

We had to install python3 and gcc 8.3.1 already. It took several hours today for us to learn how to install the newer compiler toolchain. We are going to call it quits on Centos 7. It isn't worth our effort or your time since we have a version running on Centos 8 just fine.

Thanks for the help!

PS - Instructions for gcc 8 on Centos 7: [URL]https://stackoverflow.com/questions/55345373/how-to-install-gcc-g-8-on-centos[/URL]

kriesel 2020-06-09 14:58

[QUOTE=kriesel;547508] or switch modes and dial their limits by [COLOR=Red]kilo[/COLOR]hertz increments. [/QUOTE]Oops no, make that megahertz increments.
A warm restart (shutdown -r) handled the anomalous stuck 570 Mhz situation. For now at least.

kriesel 2020-06-09 15:04

George, Mihai, how much percentage performance improvement is there from the FMA to MUL change in [URL="https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db?"]https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db[/URL]?
Does "May make MAX_ACCURACY ever so marginally faster" mean the optimization gains are coming to a conclusion?

Prime95 2020-06-09 16:35

[QUOTE=kriesel;547540]George, Mihai, how much percentage performance improvement is there from the FMA to MUL change in [URL="https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db?"]https://github.com/preda/gpuowl/commit/c336704c220e16ff246d177d7be5908ed1d445db[/URL]?
Does "May make MAX_ACCURACY ever so marginally faster" mean the optimization gains are coming to a conclusion?[/QUOTE]

Should be no faster on Radeon VII.

Yes, we are running out of ideas to make the code faster.

kriesel 2020-06-10 04:50

[QUOTE=Prime95;547544]Yes, we are running out of ideas to make the code faster.[/QUOTE]It's been a good run. So now it's up to the engineers to create faster hardware again and more of it.

I've hit a bad spot in a 9M fft LL DC run. This had repeated for hours, so I reran it with -use STATS. I don't see anything in the stats indicating trouble, but there it is in Jacobi=1 instead of -1. Bits/word looks ok to me at <17.
[CODE]2020-06-09 20:58:08 gpuowl v6.11-292-gecab9ae
2020-06-09 20:58:08 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -use NO_ASM,STATS -maxAlloc 15000
2020-06-09 20:58:08 device 2, unique id ''
2020-06-09 20:58:08 asr2/radeonvii2 159805579 FFT: 9M 1K:9:512 (16.93 bpw)
2020-06-09 20:58:08 asr2/radeonvii2 Expected maximum carry32: 2C430000
2020-06-09 20:58:11 asr2/radeonvii2 OpenCL args "-DEXP=159805579u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.60730821aeaf8p-3 -DIWEIGHT_STEP=0xf.47c6f52dba228p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-06-09 20:58:24 asr2/radeonvii2 OpenCL compilation in 13.20 s
2020-06-09 20:58:25 asr2/radeonvii2 159805579 LL 98500000 loaded: 70acc69859d13f28
2020-06-09 21:00:54 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003232, CV 0.055957, max 0.084695, z 136.9 (pErr 0.000000%)
2020-06-09 21:00:54 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1489 us/it; ETA 1d 01:19; a2af2a7b425c7d75
2020-06-09 21:03:23 asr2/radeonvii2 Roundoff: N=100000, mean 0.057745, SD 0.003224, CV 0.055838, max 0.085250, z 137.2 (pErr 0.000000%)
2020-06-09 21:03:23 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1488 us/it; ETA 1d 01:15; d05666c89b157db8
2020-06-09 21:05:51 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003235, CV 0.056023, max 0.091412, z 136.7 (pErr 0.000000%)
2020-06-09 21:05:52 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1487 us/it; ETA 1d 01:12; b16756b2ece2dc45
2020-06-09 21:08:20 asr2/radeonvii2 Roundoff: N=100000, mean 0.057747, SD 0.003231, CV 0.055950, max 0.085328, z 136.9 (pErr 0.000000%)
2020-06-09 21:08:20 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1486 us/it; ETA 1d 01:09; 0ffbd0ea6349483d
2020-06-09 21:10:48 asr2/radeonvii2 Roundoff: N=100000, mean 0.057737, SD 0.003227, CV 0.055895, max 0.082131, z 137.0 (pErr 0.000000%)
2020-06-09 21:10:49 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1486 us/it; ETA 1d 01:06; 7b7cb65b21554200
2020-06-09 21:13:17 asr2/radeonvii2 Roundoff: N=100000, mean 0.057735, SD 0.003235, CV 0.056027, max 0.084226, z 136.7 (pErr 0.000000%)
2020-06-09 21:13:17 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1485 us/it; ETA 1d 01:03; 82f79a16d2dbdb5e
2020-06-09 21:15:46 asr2/radeonvii2 Roundoff: N=100000, mean 0.057739, SD 0.003235, CV 0.056027, max 0.087179, z 136.7 (pErr 0.000000%)
2020-06-09 21:15:46 asr2/radeonvii2 159805579 LL 99200000 62.08%; 1486 us/it; ETA 1d 01:01; 6e572a98c0492314
2020-06-09 21:15:46 asr2/radeonvii2 159805579 EE 99000000 ([COLOR=red]jacobi == 1[/COLOR])
2020-06-09 21:15:47 asr2/radeonvii2 159805579 LL 98500000 loaded: 70acc69859d13f28
2020-06-09 21:18:15 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003232, CV 0.055957, max 0.084695, z 136.9 (pErr 0.000000%)
2020-06-09 21:18:15 asr2/radeonvii2 159805579 LL 98600000 61.70%; 1486 us/it; ETA 1d 01:16; a2af2a7b425c7d75
2020-06-09 21:20:43 asr2/radeonvii2 Roundoff: N=100000, mean 0.057745, SD 0.003224, CV 0.055838, max 0.085250, z 137.2 (pErr 0.000000%)
2020-06-09 21:20:44 asr2/radeonvii2 159805579 LL 98700000 61.76%; 1486 us/it; ETA 1d 01:13; d05666c89b157db8
2020-06-09 21:23:12 asr2/radeonvii2 Roundoff: N=100000, mean 0.057751, SD 0.003235, CV 0.056023, max 0.091412, z 136.7 (pErr 0.000000%)
2020-06-09 21:23:12 asr2/radeonvii2 159805579 LL 98800000 61.83%; 1487 us/it; ETA 1d 01:12; b16756b2ece2dc45
2020-06-09 21:25:41 asr2/radeonvii2 Roundoff: N=100000, mean 0.057747, SD 0.003231, CV 0.055950, max 0.085328, z 136.9 (pErr 0.000000%)
2020-06-09 21:25:41 asr2/radeonvii2 159805579 LL 98900000 61.89%; 1486 us/it; ETA 1d 01:09; 0ffbd0ea6349483d
2020-06-09 21:28:09 asr2/radeonvii2 Roundoff: N=100000, mean 0.057737, SD 0.003227, CV 0.055895, max 0.082131, z 137.0 (pErr 0.000000%)
2020-06-09 21:28:10 asr2/radeonvii2 159805579 LL 99000000 61.95%; 1486 us/it; ETA 1d 01:06; 7b7cb65b21554200
2020-06-09 21:30:38 asr2/radeonvii2 Roundoff: N=100000, mean 0.057735, SD 0.003235, CV 0.056027, max 0.084226, z 136.7 (pErr 0.000000%)
2020-06-09 21:30:38 asr2/radeonvii2 159805579 LL 99100000 62.01%; 1486 us/it; ETA 1d 01:04; 82f79a16d2dbdb5e
2020-06-09 21:30:47 asr2/radeonvii2 Stopping, please wait..
2020-06-09 21:30:47 asr2/radeonvii2 Roundoff: N=6000, mean 0.057697, SD 0.003223, CV 0.055866, max 0.076871, z 137.2 (pErr 0.000000%)
2020-06-09 21:30:48 asr2/radeonvii2 159805579 LL 99106000 62.02%; 1551 us/it; ETA 1d 02:09; 8a382a6d30513954
2020-06-09 21:30:48 asr2/radeonvii2 waiting for the Jacobi check to finish..
2020-06-09 21:31:21 asr2/radeonvii2 159805579 EE 99000000 ([COLOR=Red]jacobi == 1[/COLOR])
2020-06-09 21:31:21 asr2/radeonvii2 Exiting because "stop requested"
2020-06-09 21:31:21 asr2/radeonvii2 Bye[/CODE]Running a 5M PRP on the same hardware and conditions (clock rates & temperatures) does not produce GEC errors. Tried the other 9M and the first 10M fft with same Jacobi=1 results.


All times are UTC. The time now is 23:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.