mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

Prime95 2020-05-09 22:35

[QUOTE=ewmayer;544990]
Say I start making it the the default ... if a run hits an expo which needs an even-higher extra-accuracy setting, will that automatically kick in, thus overriding the user's setting of the flag?[/QUOTE]

If you specify the MM2_CHAIN setting and a different MM2_CHAIN setting is auto-generated, I do not know which one will win.

ewmayer 2020-05-09 23:03

[QUOTE=Prime95;544995]If you specify the MM2_CHAIN setting and a different MM2_CHAIN setting is auto-generated, I do not know which one will win.[/QUOTE]

Guess I'll find out once the 5.5M wavefront gets closer to 106M ... if my calculations are correct the next step-up to MM2_CHAIN=2 is for p >105313332, so we're getting pretty close.

Related question for you & Mihai: Can the program determine at runtime if the MM2_CHAIN setting needs upping? Because ROEs are not necessarily monotonic with exponent (much depends on the particluar DWT weights and their rounding-to-double) I found it useful in Mlucas to allow runtime-detection of such conditions, culminating in an upping of FFT length (and reset of the same-FFT-length ease-up params) if the highest setting proved insufficient for the exponent under test. But that all relies on per-iteration ROE data sampling.

Oh, have you tried forcing MM2_CHAIN=1 in your own runs? It would be useful to see how broadly the "this runs faster" effect applies. Just Radeon VIIs? Just some subset thereof?

Prime95 2020-05-10 01:48

[QUOTE=ewmayer;544996]Guess I'll find out once the 5.5M wavefront gets closer to 106M ... if my calculations are correct the next step-up to MM2_CHAIN=2 is for p >105313332, so we're getting pretty close.[/quote]

The next step is the addition of MM_CHAIN=1

[quote]
Related question for you & Mihai: Can the program determine at runtime if the MM2_CHAIN setting needs upping? Because ROEs are not necessarily monotonic with exponent (much depends on the particluar DWT weights and their rounding-to-double) I found it useful in Mlucas to allow runtime-detection of such conditions, culminating in an upping of FFT length (and reset of the same-FFT-length ease-up params) if the highest setting proved insufficient for the exponent under test. But that all relies on per-iteration ROE data sampling.[/quote]

I've found the average ROE does increase fairly predictably.

[quote]
Oh, have you tried forcing MM2_CHAIN=1 in your own runs? It would be useful to see how broadly the "this runs faster" effect applies. Just Radeon VIIs? Just some subset thereof?[/QUOTE]

Ah, the mysteries of the rocm optimizer. Preda and I generally time MIDDLE=10 for selecting default optimization. Last time I tested (in rocm 3.1) no MM2_CHAIN was faster than MM2_CHAIN=1 for MIDDLE=10

ewmayer 2020-05-13 21:13

Cross-posting from the "R7 @ newegg for $500" thread - new-build is alive, same Ubuntu 19.10 image I used to upgrade my Haswell system to host a Radeon VII (but that system remains on ROCm 2.10 for now), ROCm 3.3 installed, latest gpuowl built, but having OpenCL issues - first hit a missing-shared-lib error on program invocation which Paul Underwood helped me look into. Here the OpenCL-install info from the system as of last night:
[code]apt-cache search libOpenCL
ocl-icd-libopencl1 - Generic OpenCL ICD Loader
libopencl-clang-dev - thin wrapper for clang -- development files
libopencl-clang9 - thin wrapper for clang
nvidia-libopencl1-331 - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-331-updates - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-340 - NVIDIA OpenCL Driver and ICD Loader library
nvidia-libopencl1-340-updates - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-384 - Transitional package for nvidia-headless-390[/code]
But none of the above was actually installed:
[code]apt list --installed | grep libopencl1

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.[/code]
Nothing further - so did 'sudo apt install ocl-icd-libopencl1', that produces this entry in the above listing:
[code]ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed][/code]
and solves the missing-shared-lib problem, now gpuowl starts but immediately coredumps:
[code]2020-05-13 13:31:31 gpuowl v6.11-278-ga39cc1a
2020-05-13 13:31:31 Note: not found 'config.txt'
2020-05-13 13:31:31 device 0, unique id 'df7080c172fd5d6e'
2020-05-13 13:31:31 df7080c172fd5d6e 104954387 FFT: 5.50M 1K:11:256 (18.20 bpw)
2020-05-13 13:31:31 df7080c172fd5d6e Expected maximum carry32: 50D10000
Segmentation fault (core dumped)[/code]

Prime95 2020-05-13 21:16

Did you install libncurses5? rocm-dev?

Does clinfo work?

ewmayer 2020-05-13 21:53

1 Attachment(s)
[QUOTE=Prime95;545303]Did you install libncurses5? rocm-dev?[/quote]
I did the same install I used for the Haswell system, which IIRC was geared toward ROCm 3.0 (or maybe it was 3.1), which I later overrode to 2.10 to be able to run:
[i]
wget -qO - [url]http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key[/url] | sudo apt-key add -
echo 'deb [arch=amd64] [url]http://repo.radeon.com/rocm/apt/debian/[/url] xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update && sudo apt install rocm-dev
[/i]
[QUOTE]Does clinfo work?[/QUOTE]

'clinfo' gives
[code]Command 'clinfo' not found, but can be installed with:

sudo apt install clinfo[/code]
so did that, now 'clinfo' gives coredump. As noted above already had latest rocm-dev, but also grabbed the libncurses5 per your suggestion, now clinfo gives the expected dumpage (compressed txt-file attached), and we are looking good on the running-gpuowl front: Getting ~1355 us/iter for each of 2 runs @5.5M FFT, expos ~105M (i.e. I didn't need to use the force-MM2_CHAIN=1 speedup trick since that is the default for the expos queued up on this new build). That is appreciably faster than the 1410 us/iter I'm getting for my 2 jobs on the Haswell-system Radeon card, I wonder if ROCm 3.3 (new build) versus 2.10 (haswell) might be difference.

Prime95 2020-05-14 01:12

install libncurses5

kriesel 2020-05-14 09:57

Test= no-go, DoubleCheck= ok
 
[CODE]2020-05-14 04:53:20 gpuowl v6.11-278-ga39cc1a
2020-05-14 04:53:20 config: -user kriesel -cpu asr2/radeonvii3 -d 3 -use NO_ASM -maxAlloc 15000
2020-05-14 04:53:20 device 3, unique id ''
2020-05-14 04:53:20 asr2/radeonvii3 worktodo.txt line ignored: "Test=(AID),91493761,77,1"
2020-05-14 04:53:20 asr2/radeonvii3 Bye
[/CODE]

kriesel 2020-05-16 13:45

Mihai,

Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents. (I'm attempting to fill in double checks for LL and for PRP from the current state to where there's at least one of each for every million-exponent-range bin up to 200M, well ahead of the first-test wavefront. [URL]https://www.mersenneforum.org/showpost.php?p=501177&postcount=3[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=501181&postcount=6[/URL]) As Radeon VIIs become more common in the GIMPS fleet, and further conversion from cudalucas to gpuowl occurs on NVIDIA, the issue will become more common in LL and PRP at the wavefront also. It's tedious to check shifts one by one, and I missed a few.

[CODE]2020-05-14 18:31:17 asr2/radeonvii2-w2 140000177 OK 139800000 99.86%; 2590 us/it; ETA 0d 00:09; 420066ee63e325a2 (check 1.42s)
2020-05-14 18:39:59 asr2/radeonvii2-w2 140000177 OK 140000000 100.00%; 2604 us/it; ETA 0d 00:00; d33ef20fe4d7b3c8 (check 1.54s)
{"status":"C", "exponent":"140000177", "worktype":"PRP-3", "res64":"892fa228d6b157__", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"8388608", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii2-w2", "timestamp":"2020-05-14 23:40:02 UTC"}[/CODE](zero shift matches mrh.org's zero shift earlier run, so PrimeNet server rejects this doublecheck submission) [URL]https://www.mersenne.org/report_exponent/?exp_lo=140000177&exp_hi=&full=1[/URL]

[CODE]2020-05-15 10:07:43 asr2/radeonvii 152171251 OK 152150000 99.99%; 2624 us/it; ETA 0d 00:01; 09166e3101f3f7a1 (check 1.53s) 28 errors
{"status":"C", "exponent":"152171251", "worktype":"PRP-3", "res64":"d4e28827ea97dd__", "residue-type":"1", "errors":{"gerbicz":"28"}, "fft-length":"8388608", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii", "timestamp":"2020-05-15 15:08:42 UTC"}[/CODE](zero shift matches Mihai's zero shift earlier run, so PrimeNet server rejects this doublecheck submission) [URL]https://www.mersenne.org/report_exponent/?exp_lo=152171251&exp_hi=&full=1[/URL]
The good news is the PRP res64s on that one match to the extent it can be checked, despite 28 GEC errors detected and calculations redone from the previous check.
Well done, Dr. Gerbicz, Mihai, George, et al.

LaurV 2020-05-17 13:16

[QUOTE=kriesel;545523]Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents.[/QUOTE]
+1. As it is now, the owl is not very appealing... Beside of some doublechecking of old work, I can't do much with it, and soon I will [URL="https://www.mersenneforum.org/showthread.php?p=545606"]switch back[/URL] to my "forzes" and mfaktc, putting the "sevens" back in the store room.

kriesel 2020-05-17 13:53

[QUOTE=kriesel;545523]Mihai,

Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents. (I'm attempting to fill in double checks for LL and for PRP from the current state to where there's at least one of each for every million-exponent-range bin up to 200M, well ahead of the first-test wavefront. [URL]https://www.mersenneforum.org/showpost.php?p=501177&postcount=3[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=501181&postcount=6[/URL])[/QUOTE]
Another one, matching Roland Clarkson's first test but rejected by the server:
[CODE]2020-05-15 00:28:04 asr2/radeonvii2 121642771 OK 121600000 99.96%; 2151 us/it; ETA 0d 00:02; f394cb39ecc84d04 (check 1.16s)

{"status":"C", "exponent":"121642771", "worktype":"PRP-3", "res64":"a3569f57e1792d__", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"7340032", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii2", "timestamp":"2020-05-15 05:29:38 UTC"}

[/CODE][URL]https://www.mersenne.org/report_exponent/?exp_lo=121642771&exp_hi=&full=1[/URL]


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.