mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-05-09, 22:35   #2168
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Say I start making it the the default ... if a run hits an expo which needs an even-higher extra-accuracy setting, will that automatically kick in, thus overriding the user's setting of the flag?
If you specify the MM2_CHAIN setting and a different MM2_CHAIN setting is auto-generated, I do not know which one will win.
Prime95 is online now   Reply With Quote
Old 2020-05-09, 23:03   #2169
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

265768 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If you specify the MM2_CHAIN setting and a different MM2_CHAIN setting is auto-generated, I do not know which one will win.
Guess I'll find out once the 5.5M wavefront gets closer to 106M ... if my calculations are correct the next step-up to MM2_CHAIN=2 is for p >105313332, so we're getting pretty close.

Related question for you & Mihai: Can the program determine at runtime if the MM2_CHAIN setting needs upping? Because ROEs are not necessarily monotonic with exponent (much depends on the particluar DWT weights and their rounding-to-double) I found it useful in Mlucas to allow runtime-detection of such conditions, culminating in an upping of FFT length (and reset of the same-FFT-length ease-up params) if the highest setting proved insufficient for the exponent under test. But that all relies on per-iteration ROE data sampling.

Oh, have you tried forcing MM2_CHAIN=1 in your own runs? It would be useful to see how broadly the "this runs faster" effect applies. Just Radeon VIIs? Just some subset thereof?

Last fiddled with by ewmayer on 2020-05-09 at 23:05
ewmayer is offline   Reply With Quote
Old 2020-05-10, 01:48   #2170
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Guess I'll find out once the 5.5M wavefront gets closer to 106M ... if my calculations are correct the next step-up to MM2_CHAIN=2 is for p >105313332, so we're getting pretty close.
The next step is the addition of MM_CHAIN=1

Quote:
Related question for you & Mihai: Can the program determine at runtime if the MM2_CHAIN setting needs upping? Because ROEs are not necessarily monotonic with exponent (much depends on the particluar DWT weights and their rounding-to-double) I found it useful in Mlucas to allow runtime-detection of such conditions, culminating in an upping of FFT length (and reset of the same-FFT-length ease-up params) if the highest setting proved insufficient for the exponent under test. But that all relies on per-iteration ROE data sampling.
I've found the average ROE does increase fairly predictably.

Quote:
Oh, have you tried forcing MM2_CHAIN=1 in your own runs? It would be useful to see how broadly the "this runs faster" effect applies. Just Radeon VIIs? Just some subset thereof?
Ah, the mysteries of the rocm optimizer. Preda and I generally time MIDDLE=10 for selecting default optimization. Last time I tested (in rocm 3.1) no MM2_CHAIN was faster than MM2_CHAIN=1 for MIDDLE=10
Prime95 is online now   Reply With Quote
Old 2020-05-13, 21:13   #2171
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·32·647 Posts
Default

Cross-posting from the "R7 @ newegg for $500" thread - new-build is alive, same Ubuntu 19.10 image I used to upgrade my Haswell system to host a Radeon VII (but that system remains on ROCm 2.10 for now), ROCm 3.3 installed, latest gpuowl built, but having OpenCL issues - first hit a missing-shared-lib error on program invocation which Paul Underwood helped me look into. Here the OpenCL-install info from the system as of last night:
Code:
apt-cache search libOpenCL
ocl-icd-libopencl1 - Generic OpenCL ICD Loader
libopencl-clang-dev - thin wrapper for clang -- development files
libopencl-clang9 - thin wrapper for clang
nvidia-libopencl1-331 - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-331-updates - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-340 - NVIDIA OpenCL Driver and ICD Loader library
nvidia-libopencl1-340-updates - Transitional package for nvidia-libopencl1-340
nvidia-libopencl1-384 - Transitional package for nvidia-headless-390
But none of the above was actually installed:
Code:
apt list --installed | grep libopencl1

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Nothing further - so did 'sudo apt install ocl-icd-libopencl1', that produces this entry in the above listing:
Code:
ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed]
and solves the missing-shared-lib problem, now gpuowl starts but immediately coredumps:
Code:
2020-05-13 13:31:31 gpuowl v6.11-278-ga39cc1a
2020-05-13 13:31:31 Note: not found 'config.txt'
2020-05-13 13:31:31 device 0, unique id 'df7080c172fd5d6e'
2020-05-13 13:31:31 df7080c172fd5d6e 104954387 FFT: 5.50M 1K:11:256 (18.20 bpw)
2020-05-13 13:31:31 df7080c172fd5d6e Expected maximum carry32: 50D10000
Segmentation fault (core dumped)

Last fiddled with by ewmayer on 2020-05-13 at 21:15
ewmayer is offline   Reply With Quote
Old 2020-05-13, 21:16   #2172
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753510 Posts
Default

Did you install libncurses5? rocm-dev?

Does clinfo work?
Prime95 is online now   Reply With Quote
Old 2020-05-13, 21:53   #2173
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·32·647 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Did you install libncurses5? rocm-dev?
I did the same install I used for the Haswell system, which IIRC was geared toward ROCm 3.0 (or maybe it was 3.1), which I later overrode to 2.10 to be able to run:

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt update && sudo apt install rocm-dev

Quote:
Does clinfo work?
'clinfo' gives
Code:
Command 'clinfo' not found, but can be installed with:

sudo apt install clinfo
so did that, now 'clinfo' gives coredump. As noted above already had latest rocm-dev, but also grabbed the libncurses5 per your suggestion, now clinfo gives the expected dumpage (compressed txt-file attached), and we are looking good on the running-gpuowl front: Getting ~1355 us/iter for each of 2 runs @5.5M FFT, expos ~105M (i.e. I didn't need to use the force-MM2_CHAIN=1 speedup trick since that is the default for the expos queued up on this new build). That is appreciably faster than the 1410 us/iter I'm getting for my 2 jobs on the Haswell-system Radeon card, I wonder if ROCm 3.3 (new build) versus 2.10 (haswell) might be difference.
Attached Files
File Type: bz2 clinfo.txt.bz2 (2.5 KB, 113 views)
ewmayer is offline   Reply With Quote
Old 2020-05-14, 01:12   #2174
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5×11×137 Posts
Default

install libncurses5

Last fiddled with by ewmayer on 2020-05-14 at 01:55 Reason: I did - as noted "also grabbed the libncurses5 per your suggestion". Thanks!
Prime95 is online now   Reply With Quote
Old 2020-05-14, 09:57   #2175
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default Test= no-go, DoubleCheck= ok

Code:
2020-05-14 04:53:20 gpuowl v6.11-278-ga39cc1a
2020-05-14 04:53:20 config: -user kriesel -cpu asr2/radeonvii3 -d 3 -use NO_ASM -maxAlloc 15000
2020-05-14 04:53:20 device 3, unique id ''
2020-05-14 04:53:20 asr2/radeonvii3 worktodo.txt line ignored: "Test=(AID),91493761,77,1"
2020-05-14 04:53:20 asr2/radeonvii3 Bye
kriesel is online now   Reply With Quote
Old 2020-05-16, 13:45   #2176
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default

Mihai,

Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents. (I'm attempting to fill in double checks for LL and for PRP from the current state to where there's at least one of each for every million-exponent-range bin up to 200M, well ahead of the first-test wavefront. https://www.mersenneforum.org/showpo...77&postcount=3 https://www.mersenneforum.org/showpo...81&postcount=6) As Radeon VIIs become more common in the GIMPS fleet, and further conversion from cudalucas to gpuowl occurs on NVIDIA, the issue will become more common in LL and PRP at the wavefront also. It's tedious to check shifts one by one, and I missed a few.

Code:
2020-05-14 18:31:17 asr2/radeonvii2-w2 140000177 OK 139800000  99.86%; 2590 us/it; ETA 0d 00:09; 420066ee63e325a2 (check 1.42s)
2020-05-14 18:39:59 asr2/radeonvii2-w2 140000177 OK 140000000 100.00%; 2604 us/it; ETA 0d 00:00; d33ef20fe4d7b3c8 (check 1.54s)
{"status":"C", "exponent":"140000177", "worktype":"PRP-3", "res64":"892fa228d6b157__", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"8388608", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii2-w2", "timestamp":"2020-05-14 23:40:02 UTC"}
(zero shift matches mrh.org's zero shift earlier run, so PrimeNet server rejects this doublecheck submission) https://www.mersenne.org/report_expo...exp_hi=&full=1

Code:
2020-05-15 10:07:43 asr2/radeonvii 152171251 OK 152150000  99.99%; 2624 us/it; ETA 0d 00:01; 09166e3101f3f7a1 (check 1.53s) 28 errors
 {"status":"C", "exponent":"152171251", "worktype":"PRP-3", "res64":"d4e28827ea97dd__", "residue-type":"1", "errors":{"gerbicz":"28"}, "fft-length":"8388608", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii", "timestamp":"2020-05-15 15:08:42 UTC"}
(zero shift matches Mihai's zero shift earlier run, so PrimeNet server rejects this doublecheck submission) https://www.mersenne.org/report_expo...exp_hi=&full=1
The good news is the PRP res64s on that one match to the extent it can be checked, despite 28 GEC errors detected and calculations redone from the previous check.
Well done, Dr. Gerbicz, Mihai, George, et al.

Last fiddled with by kriesel on 2020-05-16 at 13:51
kriesel is online now   Reply With Quote
Old 2020-05-17, 13:16   #2177
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

965310 Posts
Default

Quote:
Originally Posted by kriesel View Post
Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents.
+1. As it is now, the owl is not very appealing... Beside of some doublechecking of old work, I can't do much with it, and soon I will switch back to my "forzes" and mfaktc, putting the "sevens" back in the store room.
LaurV is offline   Reply With Quote
Old 2020-05-17, 13:53   #2178
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by kriesel View Post
Mihai,

Please add pseudorandom shift to gpuowl. Its absence is interfering with doublecheck sampling of higher exponents. (I'm attempting to fill in double checks for LL and for PRP from the current state to where there's at least one of each for every million-exponent-range bin up to 200M, well ahead of the first-test wavefront. https://www.mersenneforum.org/showpo...77&postcount=3 https://www.mersenneforum.org/showpo...81&postcount=6)
Another one, matching Roland Clarkson's first test but rejected by the server:
Code:
2020-05-15 00:28:04 asr2/radeonvii2 121642771 OK 121600000  99.96%; 2151 us/it; ETA 0d 00:02; f394cb39ecc84d04 (check 1.16s)

{"status":"C", "exponent":"121642771", "worktype":"PRP-3", "res64":"a3569f57e1792d__", "residue-type":"1", "errors":{"gerbicz":"0"}, "fft-length":"7340032", "program":{"name":"gpuowl", "version":"v6.11-278-ga39cc1a"}, "user":"kriesel", "computer":"asr2/radeonvii2", "timestamp":"2020-05-15 05:29:38 UTC"}
https://www.mersenne.org/report_expo...exp_hi=&full=1
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 19:07.


Sun Aug 1 19:07:17 UTC 2021 up 9 days, 13:36, 0 users, load averages: 1.85, 2.11, 1.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.