![]() |
[QUOTE=paulunderwood;521034]:goodposting:
Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3.[/QUOTE] In mprime, a type-5 residue for Wagstaff simply calculates [c]3^(2^p) mod (2^p + 1)[/c]. So I don't think you need to do a division. Probably type-4 would also be applicable to Wagstaff? Or perhaps type-2, which is similar to type-4 except using N−1 instead of N+1. [CODE] 2: SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N 4: SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N [/CODE] |
gpuowl priorities
I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.
P-1 -time [URL]https://www.mersenneforum.org/showpost.php?p=517911&postcount=1211[/URL] P-1 fail on 8GB RX480 [URL]https://www.mersenneforum.org/showpost.php?p=517853&postcount=1208[/URL] Mihai replied in late May (post 1210) about planning to revisit P-1 memory management. P-1 save and resume [URL]https://www.mersenneforum.org/showpost.php?p=517846&postcount=1206[/URL] As things stand, I'm unable to successfully run v6.x gpuowl P-1 on AMD or NVIDIA. |
ROCm 2.6 is out. Performance is similar to 2.5
|
[QUOTE=SELROC;520775]Note: I have a script that quickly recovers after a power loss.
[URL]https://github.com/valeriob01/Mersenne-gpu-computing-node/commit/e90de7d656c60ddbb9eac294977ef4ec01174485[/URL][/QUOTE] Proposal for improvement of gpuowl checkpoint recovery: what the script does can be done in gpuowl with a few lines. If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file. |
[QUOTE=kriesel;521053]I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.[/QUOTE]
It's up to him to decide what he spends his time and effort doing. I was thinking that there might be some relatively trivial modification. Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is [c]3^(2^p) mod (2^p + 1)[/c] whereas for Mersenne (where type 1 and type 5 are the same thing), it's [c]3^(2^p − 2) mod (2^p − 1)[/c]. I don't know if there is a similarly simple modification for type 4 or type 2 residues. Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues. |
[QUOTE=GP2;521118]It's up to him to decide what he spends his time and effort doing. [/QUOTE]Of course. He volunteers his time, according to his talents and interests, like many others. None of us has a claim on him or each other, or authority to select one path versus another for him. To his credit, he sometimes accepts or asks for input from the user community. And if we users summarize outstanding issues or new desires, it can make him more efficient. Win-win.
[QUOTE]I was thinking that there might be some relatively trivial modification.[/QUOTE]It seems to me that the power difference is trivial, but the mod difference is less so. mod 2[SUP]p[/SUP]-1 result fits in p bits, and can be done rapidly in binary by adding the quotient to the remainder displaced rightward by p bits; mod 2[SUP]p[/SUP]+1 can't. Seems like p+1 bits storage and subtract quotient after a p bit right shift would be in order. That in turn implies borrows rather than carries as in the existing code. But all that is from thinking in untransformed integer binary operand terms. [QUOTE] Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is [c]3^(2^p) mod (2^p + 1)[/c] whereas for Mersenne (where type 1 and type 5 are the same thing), it's [c]3^(2^p − 2) mod (2^p − 1)[/c]. I don't know if there is a similarly simple modification for type 4 or type 2 residues. Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.[/QUOTE]Which would be ~gpuowl v1.5 to 3.9. [URL]https://www.mersenneforum.org/showpost.php?p=519603&postcount=15[/URL] There are other ways to do Wagstaff, to ~920M, though maybe not as high a p as you'd like to go to if you're thinking of taking the new Mersenne conjecture testing further. There are also other ways to do p-1 factoring on Mersennes, although not above ~432.5M in CUDAPm1 in practice, or ~920M in mprime/prime95, and not on OpenCl at all. |
[QUOTE=SELROC;521088]ROCm 2.6 is out. Performance is similar to 2.5[/QUOTE]
ROCm version 2.6 without Navi10 support until Linux 5.3 in September. amdgpu-pro has support for Navi10. |
residue-type 1 is back
Back by popular demand: residue-type 1. (in the most recent commit)
This means that GpuOwl's residue is now aligned with mprime's, and GpuOwl can be used to double-check mprime PRP results. |
1 Attachment(s)
[QUOTE=preda;521194]Back by popular demand: residue-type 1. (in the most recent commit)[/QUOTE]Built for windows, tried on RX480.
-h works, -? doesn't without a worktodo.txt existing. [CODE]>gpuowl-win -h 2019-07-10 10:29:22 gpuowl v6.5-84-g30c0508 Command line options: -dir <folder> : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log) -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner. -log <step> : log every <step> iterations, default 20000. Multiple of 10000. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -B1 : P-1 B1 bound, default 500000 -B2 : P-1 B2 bound, default B1 * 30 -rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set -prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt -pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt -results <file> : name of results file, default 'results.txt' -iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000. -use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning). -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series FFT Configurations: FFT 8K [ 0.01M - 0.18M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.91M] 64-256-6 FFT 224K [ 0.34M - 4.54M] 64-256-7 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.81M] 64-256-9 FFT 320K [ 0.49M - 6.44M] 64-256-10 FFT 352K [ 0.54M - 7.06M] 64-256-11 FFT 384K [ 0.59M - 7.69M] 64-256-12 64-512-6 FFT 448K [ 0.69M - 8.94M] 64-512-7 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.42M] 64-512-9 FFT 640K [ 0.98M - 12.66M] 64-512-10 FFT 704K [ 1.08M - 13.89M] 64-512-11 FFT 768K [ 1.18M - 15.12M] 64-512-12 64-1K-6 256-256-6 FFT 896K [ 1.38M - 17.57M] 64-1K-7 256-256-7 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.45M] 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.88M] 64-1K-10 256-256-10 FFT 1408K [ 2.16M - 27.31M] 64-1K-11 256-256-11 FFT 1536K [ 2.36M - 29.72M] 64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6 FFT 1792K [ 2.75M - 34.54M] 64-2K-7 256-512-7 512-256-7 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 FFT 2816K [ 4.33M - 53.66M] 64-2K-11 256-512-11 512-256-11 FFT 3M [ 4.72M - 58.41M] 1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6 FFT 3584K [ 5.51M - 67.87M] 1K-256-7 256-1K-7 512-512-7 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 105.41M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.74M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6 FFT 7M [ 11.01M - 133.32M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 11M [ 17.30M - 207.02M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11 FFT 12M [ 18.87M - 225.32M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6 FFT 14M [ 22.02M - 261.80M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7 FFT 16M [ 25.17M - 298.13M] 4K-2K FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 22M [ 34.60M - 406.43M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11 FFT 24M [ 37.75M - 442.34M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6 FFT 28M [ 44.04M - 513.91M] 1K-2K-7 2K-1K-7 4K-512-7 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 44M [ 69.21M - 797.64M] 1K-2K-11 2K-1K-11 4K-512-11 FFT 48M [ 75.50M - 868.07M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6 FFT 56M [ 88.08M - 1008.44M] 2K-2K-7 4K-1K-7 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10 FFT 88M [138.41M - 1564.83M] 2K-2K-11 4K-1K-11 FFT 96M [150.99M - 1702.92M] 2K-2K-12 4K-1K-12 4K-2K-6 FFT 112M [176.16M - 1978.12M] 4K-2K-7 FFT 144M [226.49M - 2525.23M] 4K-2K-9 FFT 160M [251.66M - 2797.39M] 4K-2K-10 FFT 176M [276.82M - 3068.76M] 4K-2K-11 FFT 192M [301.99M - 3339.40M] 4K-2K-12 2019-07-10 10:29:30 Exiting because "help" 2019-07-10 10:29:30 Bye >gpuowl-win -? 2019-07-10 10:29:43 gpuowl v6.5-84-g30c0508 2019-07-10 10:29:43 Note: no config.txt file found 2019-07-10 10:29:43 config: -? 2019-07-10 10:29:43 Can't open 'worktodo.txt' (mode 'rb') 2019-07-10 10:29:43 Bye [/CODE] Quick low known mersenne test passes; P-1 attempt on 47.8M fails. [CODE]>gpuowl-win -device 0 -use ORIG_X2 2019-07-10 11:06:54 gpuowl v6.5-84-g30c0508 2019-07-10 11:06:54 Note: no config.txt file found 2019-07-10 11:06:54 config: -device 0 -use ORIG_X2 2019-07-10 11:06:54 1257787 FFT 64K: Width 8x8, Height 64x8; 19.19 bits/word 2019-07-10 11:06:54 using short carry kernels 2019-07-10 11:07:02 OpenCL args "-DEXP=1257787u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u -DWEIGHT_STEP=0xe.00d75658c47c8p-3 -DIWEIGHT_STEP=0x9.2405b0b5f2d88p -4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-07-10 11:07:06 OpenCL compilation in 4069 ms 2019-07-10 11:07:06 1257787.owl not found, starting from the beginning. 2019-07-10 11:07:07 1257787 OK 2000 0.16%; 207 us/sq; ETA 0d 00:04; 46c7ab6803e1a365 (check 0.23s) 2019-07-10 11:07:11 1257787 20000 1.59%; 210 us/sq; ETA 0d 00:04; de7035c3244acc9b 2019-07-10 11:07:15 1257787 40000 3.18%; 210 us/sq; ETA 0d 00:04; 8e655f023b66fde1 2019-07-10 11:07:19 1257787 60000 4.77%; 210 us/sq; ETA 0d 00:04; e62c225bd51c0bf1 2019-07-10 11:07:23 1257787 80000 6.36%; 210 us/sq; ETA 0d 00:04; 2a37fdc214c2e7c0 2019-07-10 11:07:27 1257787 100000 7.95%; 210 us/sq; ETA 0d 00:04; 09f25999ff3326ca ... 2019-07-10 11:11:14 1257787 1180000 93.80%; 210 us/sq; ETA 0d 00:00; cfea93b53dd3f424 2019-07-10 11:11:19 1257787 1200000 95.39%; 210 us/sq; ETA 0d 00:00; 5a5b25f08d9912e4 2019-07-10 11:11:23 1257787 1220000 96.98%; 210 us/sq; ETA 0d 00:00; 66d4bd30b2ea4a7d 2019-07-10 11:11:27 1257787 1240000 98.57%; 209 us/sq; ETA 0d 00:00; 5c19c247002fe45c 2019-07-10 11:11:31 PP 1257787 / 1257787, 0000000000000001 2019-07-10 11:11:31 1257787 OK 1258000 100.00%; 211 us/sq; ETA 0d 00:00; f4d273818ecfa167 (check 0.23s) 2019-07-10 11:11:31 {"exponent":"1257787", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-07-10 16:11:31 UTC", "aid":"0", "fft-length":65536, "res64":"0000000000000001", "residue-type":1} 2019-07-10 11:11:31 47840659 FFT 2560K: Width 8x8, Height 256x8, Middle 10; 18.25 bits/word 2019-07-10 11:11:31 using short carry kernels 2019-07-10 11:11:31 OpenCL args "-DEXP=47840659u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=10u -DWEIGHT_STEP=0xd.74e0985678c5p-3 -DIWEIGHT_STEP=0x9.8318a69b48b5 p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-07-10 11:11:35 OpenCL compilation in 4221 ms 2019-07-10 11:11:36 47840659 P-1 GPU RAM fits 374 stage2 buffers @ 20.0 MB each 2019-07-10 11:11:36 47840659 P-1 using 360 stage2 buffers (8 rounds) 2019-07-10 11:11:36 P-1 (B1=440000, B2=8800000, D=30030): primes 553065, expanded 560752, doubles 96745 (left 362408), singles 359575, total 456320 (83%) 2019-07-10 11:11:36 47840659 P-1 stage2: 279 blocks starting at block 15 (456320 selected) 2019-07-10 11:11:36 47840659 P-1 starting stage1 2019-07-10 11:12:33 47840659 10000 1.58%; 5632 us/sq; ETA 0d 00:59; c56022afd3a79281 2019-07-10 11:13:29 47840659 20000 3.15%; 5632 us/sq; ETA 0d 00:58; 57fe8bc3f01d07de 2019-07-10 11:14:25 47840659 30000 4.73%; 5630 us/sq; ETA 0d 00:57; 1c811f9a541e4c93 ... 2019-07-10 12:04:10 47840659 560000 88.21%; 5632 us/sq; ETA 0d 00:07; e537d47dbf93bbb4 2019-07-10 12:05:06 47840659 570000 89.78%; 5634 us/sq; ETA 0d 00:06; f379136e785f8c92 2019-07-10 12:06:03 47840659 580000 91.36%; 5635 us/sq; ETA 0d 00:05; eb7fec5d09ff1974 2019-07-10 12:06:59 47840659 590000 92.93%; 5631 us/sq; ETA 0d 00:04; d0ea9a92d7208708 2019-07-10 12:07:55 47840659 600000 94.51%; 5630 us/sq; ETA 0d 00:03; 0247296cf97caff4 2019-07-10 12:08:52 47840659 610000 96.08%; 5630 us/sq; ETA 0d 00:02; 55ab076cedf5dee2 2019-07-10 12:09:48 47840659 620000 97.66%; 5630 us/sq; ETA 0d 00:01; 9af5dced9077c32a 2019-07-10 12:10:44 47840659 630000 99.23%; 5635 us/sq; ETA 0d 00:00; 7716930f904d8987 2019-07-10 12:11:12 P-1 stage2 too little memory 6983 MB for 360 buffers of 20971520 b 2019-07-10 12:11:52 Exiting because "P-1 not enough memory" 2019-07-10 12:11:52 Bye[/CODE]Windows build zip file attached. The readme.md included is modified somewhat, for the following changes: minor spell check and grammar check worktodo.txt entry additional examples, including P-1 and no-aid forms |
[QUOTE=SELROC;521090]Proposal for improvement of gpuowl checkpoint recovery: what the script does can be done in gpuowl with a few lines. If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file.[/QUOTE]
PS: This proposal should handle the case when after a power loss the checkpoint file is invalid, when a power loss happens while writing the checkpoint. In this case the file is not being closed and remains without the end-of-file mark. On reboot a filesystem check is done, which truncates the checkpoint file to length zero. |
Happy me.
I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course. A question for the Linux gurus: I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot? |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.