mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-07-08, 20:34   #1266
GP2
 
GP2's Avatar
 
Sep 2003

50318 Posts
Default

Quote:
Originally Posted by paulunderwood View Post


Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3.
In mprime, a type-5 residue for Wagstaff simply calculates 3^(2^p) mod (2^p + 1). So I don't think you need to do a division.

Probably type-4 would also be applicable to Wagstaff? Or perhaps type-2, which is similar to type-4 except using N−1 instead of N+1.

Code:
2:	SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N
4:	SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N
GP2 is offline   Reply With Quote
Old 2019-07-08, 22:22   #1267
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default gpuowl priorities

I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.

P-1 -time https://www.mersenneforum.org/showpo...postcount=1211

P-1 fail on 8GB RX480 https://www.mersenneforum.org/showpo...postcount=1208
Mihai replied in late May (post 1210) about planning to revisit P-1 memory management.

P-1 save and resume https://www.mersenneforum.org/showpo...postcount=1206

As things stand, I'm unable to successfully run v6.x gpuowl P-1 on AMD or NVIDIA.

Last fiddled with by kriesel on 2019-07-08 at 22:23
kriesel is online now   Reply With Quote
Old 2019-07-09, 08:17   #1268
SELROC
 

3·2,543 Posts
Default

ROCm 2.6 is out. Performance is similar to 2.5
  Reply With Quote
Old 2019-07-09, 09:00   #1269
SELROC
 

7×569 Posts
Default

Quote:
Originally Posted by SELROC View Post
Note: I have a script that quickly recovers after a power loss.


https://github.com/valeriob01/Mersen...7ef4ec01174485



Proposal for improvement of gpuowl checkpoint recovery: what the script does can be done in gpuowl with a few lines. If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file.

Last fiddled with by SELROC on 2019-07-09 at 09:01
  Reply With Quote
Old 2019-07-09, 19:03   #1270
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Quote:
Originally Posted by kriesel View Post
I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.
It's up to him to decide what he spends his time and effort doing. I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.

Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.
GP2 is offline   Reply With Quote
Old 2019-07-10, 00:51   #1271
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by GP2 View Post
It's up to him to decide what he spends his time and effort doing.
Of course. He volunteers his time, according to his talents and interests, like many others. None of us has a claim on him or each other, or authority to select one path versus another for him. To his credit, he sometimes accepts or asks for input from the user community. And if we users summarize outstanding issues or new desires, it can make him more efficient. Win-win.

Quote:
I was thinking that there might be some relatively trivial modification.
It seems to me that the power difference is trivial, but the mod difference is less so. mod 2p-1 result fits in p bits, and can be done rapidly in binary by adding the quotient to the remainder displaced rightward by p bits; mod 2p+1 can't. Seems like p+1 bits storage and subtract quotient after a p bit right shift would be in order. That in turn implies borrows rather than carries as in the existing code. But all that is from thinking in untransformed integer binary operand terms.

Quote:
Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.

Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.
Which would be ~gpuowl v1.5 to 3.9. https://www.mersenneforum.org/showpo...3&postcount=15
There are other ways to do Wagstaff, to ~920M, though maybe not as high a p as you'd like to go to if you're thinking of taking the new Mersenne conjecture testing further.
There are also other ways to do p-1 factoring on Mersennes, although not above ~432.5M in CUDAPm1 in practice, or ~920M in mprime/prime95, and not on OpenCl at all.
kriesel is online now   Reply With Quote
Old 2019-07-10, 08:42   #1272
SELROC
 

22·5·11·13 Posts
Default

Quote:
Originally Posted by SELROC View Post
ROCm 2.6 is out. Performance is similar to 2.5

ROCm version 2.6 without Navi10 support until Linux 5.3 in September.

amdgpu-pro has support for Navi10.
  Reply With Quote
Old 2019-07-10, 12:39   #1273
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default residue-type 1 is back

Back by popular demand: residue-type 1. (in the most recent commit)

This means that GpuOwl's residue is now aligned with mprime's, and GpuOwl can be used to double-check mprime PRP results.
preda is offline   Reply With Quote
Old 2019-07-10, 17:34   #1274
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by preda View Post
Back by popular demand: residue-type 1. (in the most recent commit)
Built for windows, tried on RX480.
-h works, -? doesn't without a worktodo.txt existing.
Code:
>gpuowl-win -h
2019-07-10 10:29:22 gpuowl v6.5-84-g30c0508

Command line options:

-dir <folder>      : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step>        : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound, default 500000
-B2                : P-1 B2 bound, default B1 * 30
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent>    : run a single P-1 test and exit, ignoring worktodo.txt
-results <file>    : name of results file, default 'results.txt'
-iters <N>         : run next PRP test for <N> iterations and exit. Multiple of 10000.
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning).
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

FFT Configurations:
FFT    8K [  0.01M -    0.18M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   64K [  0.10M -    1.34M]  64-512 512-64
FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.91M]  64-256-6
FFT  224K [  0.34M -    4.54M]  64-256-7
FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.81M]  64-256-9
FFT  320K [  0.49M -    6.44M]  64-256-10
FFT  352K [  0.54M -    7.06M]  64-256-11
FFT  384K [  0.59M -    7.69M]  64-256-12 64-512-6
FFT  448K [  0.69M -    8.94M]  64-512-7
FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.42M]  64-512-9
FFT  640K [  0.98M -   12.66M]  64-512-10
FFT  704K [  1.08M -   13.89M]  64-512-11
FFT  768K [  1.18M -   15.12M]  64-512-12 64-1K-6 256-256-6
FFT  896K [  1.38M -   17.57M]  64-1K-7 256-256-7
FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.45M]  64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.88M]  64-1K-10 256-256-10
FFT 1408K [  2.16M -   27.31M]  64-1K-11 256-256-11
FFT 1536K [  2.36M -   29.72M]  64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6
FFT 1792K [  2.75M -   34.54M]  64-2K-7 256-512-7 512-256-7
FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9
FFT 2560K [  3.93M -   48.90M]  64-2K-10 256-512-10 512-256-10
FFT 2816K [  4.33M -   53.66M]  64-2K-11 256-512-11 512-256-11
FFT    3M [  4.72M -   58.41M]  1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [  5.51M -   67.87M]  1K-256-7 256-1K-7 512-512-7
FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9
FFT    5M [  7.86M -   96.07M]  1K-256-10 256-1K-10 512-512-10
FFT 5632K [  8.65M -  105.41M]  1K-256-11 256-1K-11 512-512-11
FFT    6M [  9.44M -  114.74M]  1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT    7M [ 11.01M -  133.32M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  188.68M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   11M [ 17.30M -  207.02M]  1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT   12M [ 18.87M -  225.32M]  1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT   14M [ 22.02M -  261.80M]  1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT   16M [ 25.17M -  298.13M]  4K-2K
FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  370.44M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   22M [ 34.60M -  406.43M]  1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT   24M [ 37.75M -  442.34M]  1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT   28M [ 44.04M -  513.91M]  1K-2K-7 2K-1K-7 4K-512-7
FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  727.03M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   44M [ 69.21M -  797.64M]  1K-2K-11 2K-1K-11 4K-512-11
FFT   48M [ 75.50M -  868.07M]  1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT   56M [ 88.08M - 1008.44M]  2K-2K-7 4K-1K-7
FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1426.38M]  2K-2K-10 4K-1K-10
FFT   88M [138.41M - 1564.83M]  2K-2K-11 4K-1K-11
FFT   96M [150.99M - 1702.92M]  2K-2K-12 4K-1K-12 4K-2K-6
FFT  112M [176.16M - 1978.12M]  4K-2K-7
FFT  144M [226.49M - 2525.23M]  4K-2K-9
FFT  160M [251.66M - 2797.39M]  4K-2K-10
FFT  176M [276.82M - 3068.76M]  4K-2K-11
FFT  192M [301.99M - 3339.40M]  4K-2K-12
2019-07-10 10:29:30 Exiting because "help"
2019-07-10 10:29:30 Bye

>gpuowl-win -?
2019-07-10 10:29:43 gpuowl v6.5-84-g30c0508
2019-07-10 10:29:43 Note: no config.txt file found
2019-07-10 10:29:43 config: -?
2019-07-10 10:29:43 Can't open 'worktodo.txt' (mode 'rb')
2019-07-10 10:29:43 Bye
Quick low known mersenne test passes; P-1 attempt on 47.8M fails.
Code:
>gpuowl-win -device 0 -use ORIG_X2
2019-07-10 11:06:54 gpuowl v6.5-84-g30c0508
2019-07-10 11:06:54 Note: no config.txt file found
2019-07-10 11:06:54 config: -device 0 -use ORIG_X2
2019-07-10 11:06:54 1257787 FFT 64K: Width 8x8, Height 64x8; 19.19 bits/word
2019-07-10 11:06:54 using short carry kernels
2019-07-10 11:07:02 OpenCL args "-DEXP=1257787u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u -DWEIGHT_STEP=0xe.00d75658c47c8p-3 -DIWEIGHT_STEP=0x9.2405b0b5f2d88p
-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-10 11:07:06 OpenCL compilation in 4069 ms
2019-07-10 11:07:06 1257787.owl not found, starting from the beginning.
2019-07-10 11:07:07 1257787 OK     2000  0.16%;  207 us/sq; ETA 0d 00:04; 46c7ab6803e1a365 (check 0.23s)
2019-07-10 11:07:11 1257787       20000  1.59%;  210 us/sq; ETA 0d 00:04; de7035c3244acc9b
2019-07-10 11:07:15 1257787       40000  3.18%;  210 us/sq; ETA 0d 00:04; 8e655f023b66fde1
2019-07-10 11:07:19 1257787       60000  4.77%;  210 us/sq; ETA 0d 00:04; e62c225bd51c0bf1
2019-07-10 11:07:23 1257787       80000  6.36%;  210 us/sq; ETA 0d 00:04; 2a37fdc214c2e7c0
2019-07-10 11:07:27 1257787      100000  7.95%;  210 us/sq; ETA 0d 00:04; 09f25999ff3326ca
...
2019-07-10 11:11:14 1257787     1180000 93.80%;  210 us/sq; ETA 0d 00:00; cfea93b53dd3f424
2019-07-10 11:11:19 1257787     1200000 95.39%;  210 us/sq; ETA 0d 00:00; 5a5b25f08d9912e4
2019-07-10 11:11:23 1257787     1220000 96.98%;  210 us/sq; ETA 0d 00:00; 66d4bd30b2ea4a7d
2019-07-10 11:11:27 1257787     1240000 98.57%;  209 us/sq; ETA 0d 00:00; 5c19c247002fe45c
2019-07-10 11:11:31 PP  1257787 / 1257787, 0000000000000001
2019-07-10 11:11:31 1257787 OK  1258000 100.00%;  211 us/sq; ETA 0d 00:00; f4d273818ecfa167 (check 0.23s)
2019-07-10 11:11:31 {"exponent":"1257787", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-07-10
16:11:31 UTC", "aid":"0", "fft-length":65536, "res64":"0000000000000001", "residue-type":1}
2019-07-10 11:11:31 47840659 FFT 2560K: Width 8x8, Height 256x8, Middle 10; 18.25 bits/word
2019-07-10 11:11:31 using short carry kernels
2019-07-10 11:11:31 OpenCL args "-DEXP=47840659u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=10u -DWEIGHT_STEP=0xd.74e0985678c5p-3 -DIWEIGHT_STEP=0x9.8318a69b48b5
p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-10 11:11:35 OpenCL compilation in 4221 ms
2019-07-10 11:11:36 47840659 P-1 GPU RAM fits 374 stage2 buffers @ 20.0 MB each
2019-07-10 11:11:36 47840659 P-1 using 360 stage2 buffers (8 rounds)
2019-07-10 11:11:36 P-1 (B1=440000, B2=8800000, D=30030): primes 553065, expanded 560752, doubles 96745 (left 362408), singles 359575, total 456320 (83%)
2019-07-10 11:11:36 47840659 P-1 stage2: 279 blocks starting at block 15 (456320 selected)
2019-07-10 11:11:36 47840659 P-1 starting stage1
2019-07-10 11:12:33 47840659       10000  1.58%; 5632 us/sq; ETA 0d 00:59; c56022afd3a79281
2019-07-10 11:13:29 47840659       20000  3.15%; 5632 us/sq; ETA 0d 00:58; 57fe8bc3f01d07de
2019-07-10 11:14:25 47840659       30000  4.73%; 5630 us/sq; ETA 0d 00:57; 1c811f9a541e4c93
...
2019-07-10 12:04:10 47840659      560000 88.21%; 5632 us/sq; ETA 0d 00:07; e537d47dbf93bbb4
2019-07-10 12:05:06 47840659      570000 89.78%; 5634 us/sq; ETA 0d 00:06; f379136e785f8c92
2019-07-10 12:06:03 47840659      580000 91.36%; 5635 us/sq; ETA 0d 00:05; eb7fec5d09ff1974
2019-07-10 12:06:59 47840659      590000 92.93%; 5631 us/sq; ETA 0d 00:04; d0ea9a92d7208708
2019-07-10 12:07:55 47840659      600000 94.51%; 5630 us/sq; ETA 0d 00:03; 0247296cf97caff4
2019-07-10 12:08:52 47840659      610000 96.08%; 5630 us/sq; ETA 0d 00:02; 55ab076cedf5dee2
2019-07-10 12:09:48 47840659      620000 97.66%; 5630 us/sq; ETA 0d 00:01; 9af5dced9077c32a
2019-07-10 12:10:44 47840659      630000 99.23%; 5635 us/sq; ETA 0d 00:00; 7716930f904d8987
2019-07-10 12:11:12 P-1 stage2 too little memory 6983 MB for 360 buffers of 20971520 b
2019-07-10 12:11:52 Exiting because "P-1 not enough memory"
 2019-07-10 12:11:52 Bye
Windows build zip file attached. The readme.md included is modified somewhat, for the following changes:
minor spell check and grammar check
worktodo.txt entry additional examples, including P-1 and no-aid forms
Attached Files
File Type: 7z gpuowl-win-v6.5-84-g30c0508.7z (399.3 KB, 153 views)
kriesel is online now   Reply With Quote
Old 2019-07-11, 05:23   #1275
SELROC
 

100011001001102 Posts
Exclamation

Quote:
Originally Posted by SELROC View Post
Proposal for improvement of gpuowl checkpoint recovery: what the script does can be done in gpuowl with a few lines. If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file.

PS: This proposal should handle the case when after a power loss the checkpoint file is invalid, when a power loss happens while writing the checkpoint. In this case the file is not being closed and remains without the end-of-file mark. On reboot a filesystem check is done, which truncates the checkpoint file to length zero.
  Reply With Quote
Old 2019-07-11, 18:41   #1276
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Happy me.

I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course.

A question for the Linux gurus:

I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot?

Last fiddled with by Prime95 on 2019-07-11 at 18:48
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:31.


Sun Aug 1 20:31:41 UTC 2021 up 9 days, 15 hrs, 0 users, load averages: 2.47, 2.30, 1.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.