![]() |
|
|
#1266 | |
|
Sep 2003
258510 Posts |
Quote:
Probably type-4 would also be applicable to Wagstaff? Or perhaps type-2, which is similar to type-4 except using N−1 instead of N+1. Code:
2: SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N 4: SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N |
|
|
|
|
|
|
#1267 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.
P-1 -time https://www.mersenneforum.org/showpo...postcount=1211 P-1 fail on 8GB RX480 https://www.mersenneforum.org/showpo...postcount=1208 Mihai replied in late May (post 1210) about planning to revisit P-1 memory management. P-1 save and resume https://www.mersenneforum.org/showpo...postcount=1206 As things stand, I'm unable to successfully run v6.x gpuowl P-1 on AMD or NVIDIA. Last fiddled with by kriesel on 2019-07-08 at 22:23 |
|
|
|
|
|
#1268 |
|
7,883 Posts |
ROCm 2.6 is out. Performance is similar to 2.5
|
|
|
|
#1269 | |
|
32·23·47 Posts |
Quote:
Proposal for improvement of gpuowl checkpoint recovery: what the script does can be done in gpuowl with a few lines. If the checkpoint is invalid, load *-prev.owl, and overwrite the last checkpoint file. Last fiddled with by SELROC on 2019-07-09 at 09:01 |
|
|
|
|
#1270 | |
|
Sep 2003
5×11×47 Posts |
Quote:
Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues. Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues. |
|
|
|
|
|
|
#1271 | ||
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Of course. He volunteers his time, according to his talents and interests, like many others. None of us has a claim on him or each other, or authority to select one path versus another for him. To his credit, he sometimes accepts or asks for input from the user community. And if we users summarize outstanding issues or new desires, it can make him more efficient. Win-win.
Quote:
Quote:
There are other ways to do Wagstaff, to ~920M, though maybe not as high a p as you'd like to go to if you're thinking of taking the new Mersenne conjecture testing further. There are also other ways to do p-1 factoring on Mersennes, although not above ~432.5M in CUDAPm1 in practice, or ~920M in mprime/prime95, and not on OpenCl at all. |
||
|
|
|
|
|
#1272 |
|
11011110001012 Posts |
|
|
|
|
#1273 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Back by popular demand: residue-type 1. (in the most recent commit)
This means that GpuOwl's residue is now aligned with mprime's, and GpuOwl can be used to double-check mprime PRP results. |
|
|
|
|
|
#1274 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
-h works, -? doesn't without a worktodo.txt existing. Code:
>gpuowl-win -h 2019-07-10 10:29:22 gpuowl v6.5-84-g30c0508 Command line options: -dir <folder> : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log) -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner. -log <step> : log every <step> iterations, default 20000. Multiple of 10000. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -B1 : P-1 B1 bound, default 500000 -B2 : P-1 B2 bound, default B1 * 30 -rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set -prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt -pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt -results <file> : name of results file, default 'results.txt' -iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000. -use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning). -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series FFT Configurations: FFT 8K [ 0.01M - 0.18M] 64-64 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 FFT 192K [ 0.29M - 3.91M] 64-256-6 FFT 224K [ 0.34M - 4.54M] 64-256-7 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.81M] 64-256-9 FFT 320K [ 0.49M - 6.44M] 64-256-10 FFT 352K [ 0.54M - 7.06M] 64-256-11 FFT 384K [ 0.59M - 7.69M] 64-256-12 64-512-6 FFT 448K [ 0.69M - 8.94M] 64-512-7 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.42M] 64-512-9 FFT 640K [ 0.98M - 12.66M] 64-512-10 FFT 704K [ 1.08M - 13.89M] 64-512-11 FFT 768K [ 1.18M - 15.12M] 64-512-12 64-1K-6 256-256-6 FFT 896K [ 1.38M - 17.57M] 64-1K-7 256-256-7 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.45M] 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.88M] 64-1K-10 256-256-10 FFT 1408K [ 2.16M - 27.31M] 64-1K-11 256-256-11 FFT 1536K [ 2.36M - 29.72M] 64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6 FFT 1792K [ 2.75M - 34.54M] 64-2K-7 256-512-7 512-256-7 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 FFT 2816K [ 4.33M - 53.66M] 64-2K-11 256-512-11 512-256-11 FFT 3M [ 4.72M - 58.41M] 1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6 FFT 3584K [ 5.51M - 67.87M] 1K-256-7 256-1K-7 512-512-7 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 105.41M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.74M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6 FFT 7M [ 11.01M - 133.32M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10 FFT 11M [ 17.30M - 207.02M] 1K-512-11 256-2K-11 512-1K-11 2K-256-11 FFT 12M [ 18.87M - 225.32M] 1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6 FFT 14M [ 22.02M - 261.80M] 1K-1K-7 512-2K-7 2K-512-7 4K-256-7 FFT 16M [ 25.17M - 298.13M] 4K-2K FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10 FFT 22M [ 34.60M - 406.43M] 1K-1K-11 512-2K-11 2K-512-11 4K-256-11 FFT 24M [ 37.75M - 442.34M] 1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6 FFT 28M [ 44.04M - 513.91M] 1K-2K-7 2K-1K-7 4K-512-7 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10 FFT 44M [ 69.21M - 797.64M] 1K-2K-11 2K-1K-11 4K-512-11 FFT 48M [ 75.50M - 868.07M] 1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6 FFT 56M [ 88.08M - 1008.44M] 2K-2K-7 4K-1K-7 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10 FFT 88M [138.41M - 1564.83M] 2K-2K-11 4K-1K-11 FFT 96M [150.99M - 1702.92M] 2K-2K-12 4K-1K-12 4K-2K-6 FFT 112M [176.16M - 1978.12M] 4K-2K-7 FFT 144M [226.49M - 2525.23M] 4K-2K-9 FFT 160M [251.66M - 2797.39M] 4K-2K-10 FFT 176M [276.82M - 3068.76M] 4K-2K-11 FFT 192M [301.99M - 3339.40M] 4K-2K-12 2019-07-10 10:29:30 Exiting because "help" 2019-07-10 10:29:30 Bye >gpuowl-win -? 2019-07-10 10:29:43 gpuowl v6.5-84-g30c0508 2019-07-10 10:29:43 Note: no config.txt file found 2019-07-10 10:29:43 config: -? 2019-07-10 10:29:43 Can't open 'worktodo.txt' (mode 'rb') 2019-07-10 10:29:43 Bye Code:
>gpuowl-win -device 0 -use ORIG_X2
2019-07-10 11:06:54 gpuowl v6.5-84-g30c0508
2019-07-10 11:06:54 Note: no config.txt file found
2019-07-10 11:06:54 config: -device 0 -use ORIG_X2
2019-07-10 11:06:54 1257787 FFT 64K: Width 8x8, Height 64x8; 19.19 bits/word
2019-07-10 11:06:54 using short carry kernels
2019-07-10 11:07:02 OpenCL args "-DEXP=1257787u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u -DWEIGHT_STEP=0xe.00d75658c47c8p-3 -DIWEIGHT_STEP=0x9.2405b0b5f2d88p
-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-10 11:07:06 OpenCL compilation in 4069 ms
2019-07-10 11:07:06 1257787.owl not found, starting from the beginning.
2019-07-10 11:07:07 1257787 OK 2000 0.16%; 207 us/sq; ETA 0d 00:04; 46c7ab6803e1a365 (check 0.23s)
2019-07-10 11:07:11 1257787 20000 1.59%; 210 us/sq; ETA 0d 00:04; de7035c3244acc9b
2019-07-10 11:07:15 1257787 40000 3.18%; 210 us/sq; ETA 0d 00:04; 8e655f023b66fde1
2019-07-10 11:07:19 1257787 60000 4.77%; 210 us/sq; ETA 0d 00:04; e62c225bd51c0bf1
2019-07-10 11:07:23 1257787 80000 6.36%; 210 us/sq; ETA 0d 00:04; 2a37fdc214c2e7c0
2019-07-10 11:07:27 1257787 100000 7.95%; 210 us/sq; ETA 0d 00:04; 09f25999ff3326ca
...
2019-07-10 11:11:14 1257787 1180000 93.80%; 210 us/sq; ETA 0d 00:00; cfea93b53dd3f424
2019-07-10 11:11:19 1257787 1200000 95.39%; 210 us/sq; ETA 0d 00:00; 5a5b25f08d9912e4
2019-07-10 11:11:23 1257787 1220000 96.98%; 210 us/sq; ETA 0d 00:00; 66d4bd30b2ea4a7d
2019-07-10 11:11:27 1257787 1240000 98.57%; 209 us/sq; ETA 0d 00:00; 5c19c247002fe45c
2019-07-10 11:11:31 PP 1257787 / 1257787, 0000000000000001
2019-07-10 11:11:31 1257787 OK 1258000 100.00%; 211 us/sq; ETA 0d 00:00; f4d273818ecfa167 (check 0.23s)
2019-07-10 11:11:31 {"exponent":"1257787", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-84-g30c0508"}, "timestamp":"2019-07-10
16:11:31 UTC", "aid":"0", "fft-length":65536, "res64":"0000000000000001", "residue-type":1}
2019-07-10 11:11:31 47840659 FFT 2560K: Width 8x8, Height 256x8, Middle 10; 18.25 bits/word
2019-07-10 11:11:31 using short carry kernels
2019-07-10 11:11:31 OpenCL args "-DEXP=47840659u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=10u -DWEIGHT_STEP=0xd.74e0985678c5p-3 -DIWEIGHT_STEP=0x9.8318a69b48b5
p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DORIG_X2=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-07-10 11:11:35 OpenCL compilation in 4221 ms
2019-07-10 11:11:36 47840659 P-1 GPU RAM fits 374 stage2 buffers @ 20.0 MB each
2019-07-10 11:11:36 47840659 P-1 using 360 stage2 buffers (8 rounds)
2019-07-10 11:11:36 P-1 (B1=440000, B2=8800000, D=30030): primes 553065, expanded 560752, doubles 96745 (left 362408), singles 359575, total 456320 (83%)
2019-07-10 11:11:36 47840659 P-1 stage2: 279 blocks starting at block 15 (456320 selected)
2019-07-10 11:11:36 47840659 P-1 starting stage1
2019-07-10 11:12:33 47840659 10000 1.58%; 5632 us/sq; ETA 0d 00:59; c56022afd3a79281
2019-07-10 11:13:29 47840659 20000 3.15%; 5632 us/sq; ETA 0d 00:58; 57fe8bc3f01d07de
2019-07-10 11:14:25 47840659 30000 4.73%; 5630 us/sq; ETA 0d 00:57; 1c811f9a541e4c93
...
2019-07-10 12:04:10 47840659 560000 88.21%; 5632 us/sq; ETA 0d 00:07; e537d47dbf93bbb4
2019-07-10 12:05:06 47840659 570000 89.78%; 5634 us/sq; ETA 0d 00:06; f379136e785f8c92
2019-07-10 12:06:03 47840659 580000 91.36%; 5635 us/sq; ETA 0d 00:05; eb7fec5d09ff1974
2019-07-10 12:06:59 47840659 590000 92.93%; 5631 us/sq; ETA 0d 00:04; d0ea9a92d7208708
2019-07-10 12:07:55 47840659 600000 94.51%; 5630 us/sq; ETA 0d 00:03; 0247296cf97caff4
2019-07-10 12:08:52 47840659 610000 96.08%; 5630 us/sq; ETA 0d 00:02; 55ab076cedf5dee2
2019-07-10 12:09:48 47840659 620000 97.66%; 5630 us/sq; ETA 0d 00:01; 9af5dced9077c32a
2019-07-10 12:10:44 47840659 630000 99.23%; 5635 us/sq; ETA 0d 00:00; 7716930f904d8987
2019-07-10 12:11:12 P-1 stage2 too little memory 6983 MB for 360 buffers of 20971520 b
2019-07-10 12:11:52 Exiting because "P-1 not enough memory"
2019-07-10 12:11:52 Bye
minor spell check and grammar check worktodo.txt entry additional examples, including P-1 and no-aid forms |
|
|
|
|
|
|
#1275 | |
|
145628 Posts |
Quote:
PS: This proposal should handle the case when after a power loss the checkpoint file is invalid, when a power loss happens while writing the checkpoint. In this case the file is not being closed and remains without the end-of-file mark. On reboot a filesystem check is done, which truncates the checkpoint file to length zero. |
|
|
|
|
#1276 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
Happy me.
I tried to return the XFX Radeon VII for a replacement. Amazon was out of stock so they simply refunded the purchase. I ordered an Asrock Radeon VII instead. Installed today and preliminary results look great. First, the stock voltage is 50mV less. A memory overclock of 15% with a 40mV undervolting gives no errors during a short test. 0.85ms / iteration at 5M FFT! Longer testing required of course. A question for the Linux gurus: I use "crontab -e" to run mprime at boot. This does not work for gpuowl which must run as root. I tried "sudo crontab -e", but either messed up the entries or this does not work as I expected. What is the recommended way for root to start gpuowl at boot? Last fiddled with by Prime95 on 2019-07-11 at 18:48 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |