2020-05-25, 03:19 | #12 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1101111000100_{2} Posts |
Interim residues for Mersenne number large exponents
For purposes of this post, "large exponents" means well ahead of the first test wavefront, which is at ~102M-114M as of March 2021.
Note, while both PRP and LL interim residues are listed here, PRP with GEC and proof generation should be used for large exponents and first-test wavefront exponents whenever practical. High proof powers, at least 8 are recommended. Proof generation enables verification at ~1% or less the effort of a traditional full double check. GEC essentially eliminates error in final residue results. At 100Mdigit, a small sample of results of LL & DC & TC indicates an error rate of about 20% per test. That is consistent with error proportional to run time. Extrapolating to higher exponent indicates high probability of error near 1G. (Run time of 999M is (999/332)^{2.1}= 10.1 times longer; 1 - (1-0.20)^{10.1} = 0.89, 89% probability of bad result.) ~50Mdigit interim residues 166000013 from a complete PRP3 with proof certified valid, meaning all interim residues are valid (49,970,984 decimal digits) FFT: 9M 1K:9:512 (17.59 bpw) Code:
gpuowl-win-v6.11-364-g36f4e2a 2020-08-09 04:21:37 asr2/radeonvii0 166000013 OK 800 0.00%; 1418 us/it; ETA 2d 17:23; 9c30f47224fb8783 (check 0.96s) 2020-08-09 04:26:20 asr2/radeonvii0 166000013 OK 200000 0.12%; 1417 us/it; ETA 2d 17:15; c6ae0c584288f318 (check 0.94s) 2020-08-09 04:45:17 asr2/radeonvii0 166000013 OK 1000000 0.60%; 1417 us/it; ETA 2d 16:56; 8d41ce621b768b00 (check 1.03s) 2020-08-09 08:18:39 asr2/radeonvii0 166000013 OK 10000000 6.02%; 1416 us/it; ETA 2d 13:22; b8fba9175797f5ed (check 0.95s) 2020-08-11 07:15:04 asr2/radeonvii0 166000013 OK 100000000 60.24%; 1418 us/it; ETA 1d 02:00; b00fa62adb2d712c (check 0.95s) mlucas v17.0 LL 1 core of a xeon e5645 (Not recommended; at these timings, actually msec not sec/iteration, a full run is over 8 years; v17.0 did not include the Jacobi check. The odds of correct completion over such a long run are negligible, unless the system has ECC ram, which this system does.) 332220523 100,008,343 decimal digits Code:
[Jul 22 13:31:39] M332220523 Iter# = 10000 [ 0.00% complete] clocks = 02:16:04.515 [816.4516 sec/iter] Res64: 1A313D709BFA6663. AvgMaxErr = 0.171224865. MaxErr = 0.250000000. [Jul 23 09:51:21] M332220523 Iter# = 100000 [ 0.03% complete] clocks = 02:16:05.131 [816.5132 sec/iter] Res64: 91B688264B5B3F39. AvgMaxErr = 0.171926060. MaxErr = 0.250000000. [Jul 31 21:47:38] M332220523 Iter# = 1000000 [ 0.30% complete] clocks = 02:15:44.087 [814.4088 sec/iter] Res64: 7CC62737AA46CDF8. AvgMaxErr = 0.171821204. MaxErr = 0.234375000. [Aug 10 08:05:29] M332220523 Iter# = 2000000 [ 0.60% complete] clocks = 02:15:26.909 [812.6910 sec/iter] Res64: 627DFA42F70BD3B4. AvgMaxErr = 0.172038670. MaxErr = 0.250000000. [Oct 26 19:22:44] M332220523 Iter# = 10000000 [ 3.01% complete] clocks = 02:20:19.131 [841.9132 sec/iter] Res64: 2B8CC43403D28DAC. AvgMaxErr = 0.171851920. MaxErr = 0.250000000. [Mar 22 07:57:38] M332220523 Iter# = 100000000 [30.10% complete] clocks = 01:52:43.917 [676.3917 sec/iter] Res64: 0926E801B07CD05E. AvgMaxErr = 0.171837840. MaxErr = 0.250000000. gpuowl v6.11-292 (manageable projected total time at 15 days on a Radeon VII) Code:
2020-05-24 20:41:20 asr2/radeonvii3-w2 332220523 LL 100000 0.03%; 3843 us/it; ETA 14d 18:33; 91b688264b5b3f39 2020-05-24 21:39:36 asr2/radeonvii3-w2 332220523 LL 1000000 0.30%; 3870 us/it; ETA 14d 20:03; 7cc62737aa46cdf8 Code:
2020-07-30 09:48:45 asr2/radeonvii2 332220523 LL 2000000 0.60%; 3722 us/it; ETA 14d 05:25; 627dfa42f70bd3b4 Code:
2022-07-16 19:12:31 asr2/radeonvii3 332220523 LL 10000000 3.01%; 3861 us/it; ETA 14d 09:34; 2b8cc43403d28dac gpuowl v6.0-b7bb1c3 PRP 332220523 on an RX480 Code:
2019-02-04 23:30:02 condorella/rx-480 332220523 OK 10000 0.00%; 16.52 ms/sq; ETA 63d 12:29; 503cd91d7b8e30e5 (check 7.48s) Code:
2021-03-15 18:04:59 332220523 100000 0.03%; 18714 us/sq; ETA 71d 22:30; 951c94f813216db9 Code:
2021-03-15 18:33:02 condorella/rx480 332220523 OK 200000 0.06%; 14796 us/it; ETA 56d 20:39; 6cd7d19bb77ad049 (check 6.58s) 2021-03-15 19:22:27 condorella/rx480 332220523 OK 400000 0.12%; 14793 us/it; ETA 56d 19:30; b12cf8adffda122c (check 6.56s) Code:
2020-09-03 13:13:03 asr2/5700xt 332220523 OK 200000 0.06%; 7520 us/it; ETA 28d 21:34; 6cd7d19bb77ad049 (check 4.05s) CUDALucas v2.06 on gtx1080ti (LL, no Jacobi check, unverified; not recommended at 1.4 years estimated duration on a GTX1080Ti) 999999937 301,029,977 decimal digits Code:
| Jan 09 04:48:51 | M999999937 10000 0x567ad47461d3bb5f | 57344K 0.21875 41.1682 41.16s | 473:21:41:27 0.00% | | Jan 20 08:14:38 | M999999937 100000 0xe776f4a0dcd3491d | 57344K 0.18750 44.3713 44.37s | 506:19:41:57 0.01% | | Jan 20 19:26:39 | M999999937 1000000 0x141a108c13a86d5a | 57344K 0.18750 45.3946 45.39s | 516:19:59:55 0.10% | | Jan 23 07:03:54 | M999999937 5000000 0x0811f10855dab84c | 57344K 0.20313 44.9825 44.98s | 516:10:56:38 0.50% | Code:
2021-03-27 14:28:07 gpuowl v6.11-380-g79ea0cc 2021-03-27 14:29:16 asr3/gtx1080 999999937 LL 1000 0.00%; 52191 us/it; ETA 604d 01:29; ddadfed64e080856 (Jacobi check on exit passed; continuing:) 2021-03-27 14:55:18 asr3/gtx1080 999999937 LL 10000 0.00%; 50442 us/it; ETA 583d 19:32; 567ad47461d3bb5f 2021-03-27 15:03:46 asr3/gtx1080 999999937 LL 20000 0.00%; 50782 us/it; ETA 587d 17:50; 78a2f270a1bba92d 2021-03-27 15:12:14 asr3/gtx1080 999999937 LL 30000 0.00%; 50794 us/it; ETA 587d 21:05; d79b942904525426 2021-03-27 15:20:42 asr3/gtx1080 999999937 LL 40000 0.00%; 50823 us/it; ETA 588d 05:02; 650ca8c106ac7d12 2021-03-27 15:29:11 asr3/gtx1080 999999937 LL 50000 0.01%; 50869 us/it; ETA 588d 17:34; 98420630c9a4b877 2021-03-27 15:37:39 asr3/gtx1080 999999937 LL 60000 0.01%; 50850 us/it; ETA 588d 12:03; adb31fa7cf23b0ba 2021-03-27 15:46:08 asr3/gtx1080 999999937 LL 70000 0.01%; 50847 us/it; ETA 588d 11:13; 20b88580b0d5c8a5 2021-03-27 15:54:38 asr3/gtx1080 999999937 LL 80000 0.01%; 51002 us/it; ETA 590d 06:05; 2b4ab3e44fcbdc84 2021-03-27 16:03:07 asr3/gtx1080 999999937 LL 90000 0.01%; 50957 us/it; ETA 589d 17:20; b9c6eeca4e553904 2021-03-27 16:11:37 asr3/gtx1080 999999937 LL 100000 0.01%; 50972 us/it; ETA 589d 21:24; e776f4a0dcd3491d 999999937 Iteration 110,000,000 Residue64 0x87ded9d3ab3e7f38 See also https://mersenneforum.org/showthread...859#post495859 The PRP equivalent is feasible marginally with Gpuowl on a Radeon VII, at ~0.51 years to complete; LL in Gpuowl on a Radeon VII is also possible but not recommended since it's only protected by an occasional Jacobi check's ~50% chance of detection of an error. Such long runs are likely to go wrong with undetected errors. Gpuowl v6.2-e2ffe65 PRP on RX480: Code:
2019-02-08 17:48:10 condorella/rx480 999999937 10000 0.00%; 72.02 ms/sq; ETA 833d 14:01; a30a0c45e9fb828c 2019-02-08 21:16:26 condorella/rx480 999999937 100000 0.01%; 74.38 ms/sq; ETA 860d 18:51; 3efc806b68d92b86 2021-03-16 12:20:54 condorella/rx480 999999937 OK 1000000 0.10%; 49236 us/it; ETA 569d 07:00; e1b53ccf581f928b (check 22.08s) Mlucas: Code:
[2022-05-01 20:39:46] M999999937 Iter# = 10000 [ 0.00% complete] clocks = 04:05:49.917 [1474.9918 msec/iter] Res64: A30A0C45E9FB828C. AvgMaxErr = 0.154966345. MaxErr = 0.218750000. Residue shift count = 248482955. gpuowl v4.6 PRP on RX480 (not recommended, at estimated 7.8 years to completion; also moot since it has a known small factor) Following is combined PRP/P-1, type 0, B1 bounds 0, so presumably base = 3. 2018-11-01 16:35:37 condorella-rx480 10000/1500000041 [ 0.00%], 164.51 ms/it [163.50, 174.58]; ETA 2856d 01:42; 4fc4ebb3728095f7 CUDALucas 2.06 fails on 1,500,000,043 with an error similar to that for 2Gbit (see below). 2Gbit interim residues P-1 stage 1 appears possible on gpuowl v6.11-380, at over 6 days on a Radeon VII: Code:
2021-03-26 13:57:40 asr2/radeonvii4 2147483563 P1 B1=11000000, B2=600000000; 15869712 bits; starting at 0 2021-03-26 14:02:48 asr2/radeonvii4 saved 2021-03-26 14:03:21 asr2/radeonvii4 2147483563 P1 10000 0.06%; 34113 us/it; ETA 6d 06:17; d78cbf554970d8c1 https://www.mersenne.ca/exponent/2147483743 Projected PRP time on a Radeon VII is 2.63 years, not recommended. Gpuowl V6.11-380 PRP interim residues with timings: Code:
2021-03-26 12:43:44 asr2/radeonvii4 2147483743 OK 20000 0.00%; 38636 us/it; ETA 960d 07:01; 304a08968e896b20 (check 23.14s) 2021-03-26 13:36:46 asr2/radeonvii4 2147483743 OK 100000 0.00%; 38636 us/it; ETA 960d 05:56; 0637bf5d33611a9d (check 22.85s) Code:
[2022-05-01 22:01:47] M2147483743 Iter# = 10000 [ 0.00% complete] clocks = 01:11:35.048 [429.5048 msec/iter] Res64: FDB3067B242478FA. AvgMaxErr = 0.234596052. MaxErr = 0.312500000. Residue shift count = 1753623453. [2022-05-01 23:14:04] M2147483743 Iter# = 20000 [ 0.00% complete] clocks = 01:11:42.295 [430.2295 msec/iter] Res64: 304A08968E896B20. AvgMaxErr = 0.235129663. MaxErr = 0.312500000. Residue shift count = 674700330. Code:
[2022-05-01 21:58:16] M2147483743 Iter# = 10000 [ 0.00% complete] clocks = 01:18:25.244 [470.5244 msec/iter] Res64: 94A87DFAE884EB75. AvgMaxErr = 0.234346936. MaxErr = 0.312500000. Residue shift count = 1552261820. [2022-05-02 09:45:49] M2147483743 Iter# = 100000 [ 0.00% complete] clocks = 01:18:24.558 [470.4559 msec/iter] Res64: 3A997CAD52805EF0. AvgMaxErr = 0.234872706. MaxErr = 0.296875000. Residue shift count = 1308916996. Code:
2022-05-04 11:42:54 test/radeonvii 2147483743 LL 10000 0.00%; 34732 us/it; ETA 863d 06:23; 94a87dfae884eb75 2022-05-04 12:35:20 test/radeonvii 2147483743 LL 100000 0.00%; 34700 us/it; ETA 862d 10:25; 3a997cad52805ef0 Gpuowl v6.5-84 PRP3 (not recommended at an estimated 6.3 years on a Radeon VII; lacks PRP proof capability) M3,321,928,097 1,000,000,001 decimal digits (This one is moot as it has multiple known small factors) Code:
FFT 196608K: Width 512x8, Height 256x8, Middle 12; 16.50 bits/word 2020-05-24 17:21:43 radeonvii3 3,321,928,097 20000 0.00%; 60053 us/sq; ETA 2308d 21:54; 1a05f0ca51fb8e7a 2020-05-24 18:41:56 radeonvii3 3,321,928,097 100000 0.00%; 60122 us/sq; ETA 2311d 12:24; 5b2cb77f57840bcc Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-10-31 at 03:17 Reason: 2G P-1 stage 2 memory requirement |
2021-03-02, 03:21 | #13 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1BC4_{16} Posts |
LL with shift
See also Ernst's post on shift in Mlucas.
Think of the primality testing software's math routines as an emulation of a very wide word calculator. It dynamically adjusts to p bits width, where p is the prime exponent of Mp=2^{p}-1. 1) All the LL tests should use the same (before shifted) seed value, S_{0}=4, not ten or the other known seed values, or bad seed values such as assorted other powers of two, so that the composite numbers' final test residues will be comparable. That's separate from considerations of shift. See https://www.mersenneforum.org/showpo...12&postcount=6 2) A very simple example with zero shift: p=7, M7=127 s_{0}=4 = 100.b shift = 0 so the value in our emulated wide-word calculator is 4. S_{1}= 4^2-2 = 14 = 1110.b shift does not change from one iteration to the next. S_{2}=14^2-2 =194 = 1 1000010 mod M7 = 67 = 1000011.b 3) Second run, same simple example except with shift 1: p=7 s_{0}=4 startingshift = 1 so the value in our simulated fft calculator lands one bit left, looking like 8 = 1000 but representing 100.0b. At this point previousshift=startingshift. square it, compute the new shift, and apply a shifted -2: 8^2 =64 = 1000000 but representing 10000.00b. shift=previousshift * 2 = 2 subtract 2 <<shift = 8, leaving 56 =111000 (representing 14 shifted 2 left, 1110.00b) Mod Mp is relative to that shifted "." for the next iteration, shift = previousshift *2=4, and repeat. S_{2}=56^2 - 2<<4 = 3104 mod M7<<4 = 1072 = 1000011.0000b (67<<4) But wait, it's actually wrapped in p=7 bits (Mod M7): 1000 011.0000b = 011. 1000 which is not 3.5, it's a wrapped representation of a 4<<shifted 67. 4) What the programs supporting nonzero shift will report in their test result as shift is startingshift. But they will retrieve the residue from the simulated calculator beginning at lsb = the final shift computed after the last iteration, doing the final de-shift and de-wrap then. It's more complicated than just doubling shift as in the preceding, because the unit point wraps around, many times. 5) In practice, for programs that implement shift (offset), the initial shift is chosen as pseudorandom in the range 1 to p-1. (I think that was 0 to p-1 at one point, but got changed to reduce the small chance of separate runs coinciding at 0 shift. Shifting by p bits is equivalent to no shift, so should also be excluded. Some applications, and some versions of prime95, CUDALucas, and Gpouwl, did not implement shift so were always 0 shift.) So about p/2 on average would be the initial shift. For shift=3, 5 iterations for p=7 give successive iteration shifts of 6, 12, 24, 48, 96 mod 7 = 5 for recovery of the final residue. 6) Why go to all this trouble of shifting and its extra accounting? Because sometimes in using double precision floating point fft to rapidly emulate a monstrously-wide-word integer calculator, for p in the many millions, the round-off error affects the outcome, or other errors occur, and having the data shifted differently relative to the 16 - 19 bits/word boundaries causes the errors to behave differently, and sufficiently so for their effect to change the final result. It can also cause different behavior from software bugs that are data-dependent. Including, ironically, a bug in the shift code with a one in a million chance of producing wrong residues. Or here. The use of pseudorandom offset is one of a number of techniques to avoid, detect, or detour around numerical error. And result reporting error causing duplicate reporting from a single run, leading to a singly tested exponent being falsely recorded as double-checked with match, can be detected because of matched shift values. 7) The authors of the major GIMPS primality testing software could give a clearer explanation and probably have. Ernst's Odroid magazine article may include one. George, Mihai, and Ernst have posted about these matters in the prime95 and gpuowl and Mlucas development threads. 8) I think the list of "shifty" GIMPS programs includes prime95/mprime beginning by v17, CUDALucas beginning v2.05, Mlucas beginning v18, and rare early gpuowl versions, and the list of "shiftless" GIMPS programs includes cllucas, early CUDALucas, most versions of gpuowl (whether implementing LL or PRP or both), and Mlucas before V18. Shift is useful in primality testing which would get done more than once. It is not useful in computations that typically get done once, such as TF or P-1. Shift is known to be not present in prime95 P-1 factoring, CUDAPm1, and gpuowl P-1. Shift is also absent from Mlucas V20.x fft lengths above ~256M words. Top of this thread https://www.mersenneforum.org/showthread.php?t=24003 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-11-04 at 18:11 Reason: typofix; update for Mlucas V20.x |
2021-03-27, 21:58 | #14 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7108_{10} Posts |
Challenges of large exponents
Server Support
TFSoftware availability TFMemory requirements There are several requirements to consider. Disk space, system ram, and sometimes GPU ram; also how much data may be transferred to and from the PrimeNet server in the case of PRP proof and certification.Run time scaling Run time scaling for some specific hardware and software combinations can be found in the relevant software-specific reference threads.Hardware lifetime Run times of large exponents may exceed the probable hardware useful life, or the user's remaining life expectancy. A good backup regimen and transfer to replacement hardware is a workaround, if the user has enough patience and human life expectancy remaining. Very long undertakings may also need a succession plan for personnel.Software & Hardware combination reliability Overall experience 100M exponent experience Limited data on 100Mdigit HigherCUDALucas is Lucas-Lehmer test capable only, not PRP. It lacks Jacobi symbol error check. The GEC that is so useful in keeping PRP runs reliable is not applicable to the LL residue sequence, so not applicable to CUDALucas. It is also substantially slower than recent Gpuowl (V6.11-3xx, v7.x-y) on the same hardware and assignment. Avoid CUDALucas. Use Gpuowl whenever possible. 100Mdigit experiments Obtaining matching interim residues of 100Mdigit exponents is usually straightforward on available software and adequate reliability hardware. Matching interim residues were produced with independent runs on the first tries, among Mlucas and Gpuowl for LL, and Gpuowl for PRP. (See https://www.mersenneforum.org/showpost.php?p=546384 for details) ~gigabit or 300Mdigit experiments Obtaining matching interim residues of 300Mdigit exponents involved 4 attempts to get two matching. Details follow; results are summarized at the above link. interim residues (mostly unverified, except for green) CUDALucas v2.06 on GTX1080Ti (LL, no Jacobi check, unverified; not recommended at 1.4 years estimated duration on a GTX1080Ti) 999999937 301,029,977 decimal digits Code:
| Jan 09 04:48:51 | M999999937 10000 0x567ad47461d3bb5f | 57344K 0.21875 41.1682 41.16s | 473:21:41:27 0.00% | | Jan 20 08:14:38 | M999999937 100000 0xe776f4a0dcd3491d | 57344K 0.18750 44.3713 44.37s | 506:19:41:57 0.01% | | Jan 20 19:26:39 | M999999937 1000000 0x141a108c13a86d5a | 57344K 0.18750 45.3946 45.39s | 516:19:59:55 0.10% | | Jan 23 07:03:54 | M999999937 5000000 0x0811f10855dab84c | 57344K 0.20313 44.9825 44.98s | 516:10:56:38 0.50% | Code:
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Mar 27 13:24:34 | M999999937 10000 0xd0ba62caab74a325 | 57344K 0.20313 58.3992 224.72s | 671:06:41:49 0.00% | | Mar 27 13:34:21 | M999999937 20000 0x3b0de44a71284153 | 57344K 0.20313 58.7176 587.17s | 675:10:19:38 0.00% | | Mar 27 13:44:08 | M999999937 30000 0xee988cb77112ea03 | 57344K 0.20313 58.7149 587.14s | 676:19:10:58 0.00% | | Mar 27 13:53:55 | M999999937 40000 0x906263cb2b36ac4c | 57344K 0.20313 58.7086 587.08s | 677:11:05:19 0.00% | | Mar 27 14:03:43 | M999999937 50000 0x454a88b55988ce1e | 57344K 0.21094 58.7162 587.16s | 677:20:59:32 0.00% | Any residues produced after a known-bad residue (red or purple here) will also be bad. The grayed lines are preserved since they show iteration speed variation. Gpuowl (which has the Jacobi check for LL in some versions) on RX480, not recommended at 1.56 years or more depending on gpuowl version; GTX1080 similar; Radeon VII marginally feasible at ~6 months Code:
2019-02-08 17:34:48 gpuowl v6.2-e2ffe65 2019-02-08 17:34:48 condorella/rx480 -user kriesel -cpu condorella/rx480 -device 0 -fft +0 2019-02-08 17:34:48 condorella/rx480 999999937 FFT 73728K: Width 256x8, Height 256x8, Middle 9; 13.25 bits/word 2019-02-08 17:34:48 condorella/rx480 using long carry kernels 2019-02-08 17:34:54 condorella/rx480 OpenCL compilation in 5333 ms, with "-DEXP=999999937u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-02-08 17:35:06 condorella/rx480 999999937.owl not found, starting from the beginning. 2019-02-08 17:37:08 condorella/rx480 999999937 OK 800 0.00%; 71.64 ms/sq; ETA 829d 05:22; c3c8e02da339fdfa (check 31.90s) 2019-02-08 17:48:10 condorella/rx480 999999937 10000 0.00%; 72.02 ms/sq; ETA 833d 14:01; a30a0c45e9fb828c 2019-02-08 21:16:26 condorella/rx480 999999937 100000 0.01%; 74.38 ms/sq; ETA 860d 18:51; 3efc806b68d92b86 2021-03-15 23:16:33 999999937 FFT 57344K: Width 256x8, Height 256x8, Middle 7; 17.03 bits/word gpuowl v6.11-380-g79ea0cc on RX480 continuation Code:
2021-03-16 06:23:01 condorella/rx480 999999937 FFT: 56M 4K:14:512 (17.03 bpw) 2021-03-16 06:23:01 condorella/rx480 Expected maximum carry32: 833C0000 2021-03-16 06:51:54 condorella/rx480 999999937 OK 600000 0.06%; 49222 us/it; ETA 569d 08:39; c974a7f641226218 (check 21.93s) 2021-03-16 09:36:25 condorella/rx480 999999937 OK 800000 0.08%; 49242 us/it; ETA 569d 11:23; 321dd77853313c0c (check 22.04s) 2021-03-16 12:20:54 condorella/rx480 999999937 OK 1000000 0.10%; 49236 us/it; ETA 569d 07:00; e1b53ccf581f928b (check 22.08s) gpuowl v6.11-380-g79ea0cc independent run from start on gtx 1080 Code:
2021-03-27 14:28:07 gpuowl v6.11-380-g79ea0cc 2021-03-27 14:28:07 config: -device 1 -user kriesel -cpu asr3/gtx1080 -maxAlloc 6500 -proof 9 -cleanup -yield -use NO_ASM 2021-03-27 14:28:07 device 1, unique id '' 2021-03-27 14:28:07 asr3/gtx1080 999999937 FFT: 56M 4K:14:512 (17.03 bpw) 2021-03-27 14:28:07 asr3/gtx1080 Expected maximum carry32: 833C0000 2021-03-27 14:28:16 asr3/gtx1080 OpenCL args "-DEXP=999999937u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=14u -DPM1=0 -DCARRY64=1 -DWEIGHT_STEP_MINUS_1=0xf.57fb440c6997p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.aa3b4ca84faap-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-03-27 14:28:22 asr3/gtx1080 2021-03-27 14:28:22 asr3/gtx1080 OpenCL compilation in 6.10 s 2021-03-27 14:28:24 asr3/gtx1080 999999937 LL 0 loaded: 0000000000000004 2021-03-27 14:29:14 asr3/gtx1080 Stopping, please wait.. 2021-03-27 14:29:16 asr3/gtx1080 999999937 LL 1000 0.00%; 52191 us/it; ETA 604d 01:29; ddadfed64e080856 (Jacobi check on exit passed; continuing:) 2021-03-27 14:55:18 asr3/gtx1080 999999937 LL 10000 0.00%; 50442 us/it; ETA 583d 19:32; 567ad47461d3bb5f 2021-03-27 15:03:46 asr3/gtx1080 999999937 LL 20000 0.00%; 50782 us/it; ETA 587d 17:50; 78a2f270a1bba92d 2021-03-27 15:12:14 asr3/gtx1080 999999937 LL 30000 0.00%; 50794 us/it; ETA 587d 21:05; d79b942904525426 2021-03-27 15:20:42 asr3/gtx1080 999999937 LL 40000 0.00%; 50823 us/it; ETA 588d 05:02; 650ca8c106ac7d12 2021-03-27 15:29:11 asr3/gtx1080 999999937 LL 50000 0.01%; 50869 us/it; ETA 588d 17:34; 98420630c9a4b877 2021-03-27 15:37:39 asr3/gtx1080 999999937 LL 60000 0.01%; 50850 us/it; ETA 588d 12:03; adb31fa7cf23b0ba 2021-03-27 15:46:08 asr3/gtx1080 999999937 LL 70000 0.01%; 50847 us/it; ETA 588d 11:13; 20b88580b0d5c8a5 2021-03-27 15:54:38 asr3/gtx1080 999999937 LL 80000 0.01%; 51002 us/it; ETA 590d 06:05; 2b4ab3e44fcbdc84 2021-03-27 16:03:07 asr3/gtx1080 999999937 LL 90000 0.01%; 50957 us/it; ETA 589d 17:20; b9c6eeca4e553904 2021-03-27 16:11:37 asr3/gtx1080 999999937 LL 100000 0.01%; 50972 us/it; ETA 589d 21:24; e776f4a0dcd3491d (there was a long dormant period for the partial run) 2022-04-19 07:41:53 asr3/gtx1080 999999937 OK 900000 (jacobi == -1) 2022-04-19 08:24:45 asr3/gtx1080 999999937 LL 1000000 0.10%; 51451 us/it; ETA 594d 21:37; 141a108c13a86d5a 2022-04-19 09:07:37 asr3/gtx1080 999999937 LL 1050000 0.11%; 51422 us/it; ETA 594d 12:53; b1b3f4ff1e3553fb 2022-04-19 09:07:37 asr3/gtx1080 999999937 OK 1000000 (jacobi == -1) 999999937 Iteration 110,000,000 Residue64 0x87ded9d3ab3e7f38 See also https://mersenneforum.org/showthread...859#post495859 The PRP equivalent is feasible marginally with Gpuowl on a Radeon VII GPU, at ~0.5 years to complete; LL in Gpuowl on a Radeon VII is also possible but not recommended since it's only protected by an occasional Jacobi check's ~50% chance of detection of an error. Such long runs are likely to go wrong with undetected errors. Gpuowl v6.2-e2ffe65 PRP on RX480 GPU: Code:
2019-02-08 17:48:10 condorella/rx480 999999937 10000 0.00%; 72.02 ms/sq; ETA 833d 14:01; a30a0c45e9fb828c 2019-02-08 21:16:26 condorella/rx480 999999937 100000 0.01%; 74.38 ms/sq; ETA 860d 18:51; 3efc806b68d92b86 Code:
2021-03-16 12:20:54 condorella/rx480 999999937 OK 1000000 0.10%; 49236 us/it; ETA 569d 07:00; e1b53ccf581f928b (check 22.08s) Code:
[2021-11-13 00:04:26] M999999937 Iter# = 10000 [ 0.00% complete] clocks = 01:12:27.209 [434.7210 msec/iter] Res64: A30A0C45E9FB828C. AvgMaxErr = 0.033546355. MaxErr = 0.046875000. Residue shift count = 118809714. [2021-11-13 08:19:25] M999999937 Iter# = 100000 [ 0.01% complete] clocks = 00:33:06.763 [198.6764 msec/iter] Res64: 3EFC806B68D92B86. AvgMaxErr = 0.030404632. MaxErr = 0.039062500. Residue shift count = 423797417. 1.25Gbit experiment CUDALucas on GTX1080 wrote a save file and stopped upon request. Code:
Using threads: square 512, splice 64. Starting M1250000033 fft length = 73728K SIGINT caught, writing checkpoint. Estimated time spent so far: 12:18 Code:
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Mar 28 13:21:04 | M1250000033 10000 0xf40e592716cd5c7a | 73728K 0.12500 75.3094 11.14s | 1084:17:26:53 0.00% | Code:
2021-03-30 18:45:38 gpuowl v6.11-380-g79ea0cc 2021-03-30 18:45:38 config: -user kriesel -cpu asr2/radeonvii4 -d 4 -use NO_ASM -maxAlloc 15000 -cleanup -block 1000 -log 10000 2021-03-30 18:45:38 device 4, unique id '' 2021-03-30 18:45:38 asr2/radeonvii4 1250000033 FFT: 72M 4K:9:1K (16.56 bpw) 2021-03-30 18:45:38 asr2/radeonvii4 Expected maximum carry32: 6CC80000 2021-03-30 18:45:49 asr2/radeonvii4 OpenCL args "-DEXP=1250000033u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xb.819f9530b86cp-5 -DIWEIGHT_STEP_MINUS_1=-0x8.76945b26097f8p-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-03-30 18:45:54 asr2/radeonvii4 OpenCL compilation in 4.79 s 2021-03-30 18:46:00 asr2/radeonvii4 1250000033 LL 0 loaded: 0000000000000004 2021-03-30 18:48:54 asr2/radeonvii4 1250000033 LL 10000 0.00%; 17459 us/it; ETA 252d 13:58; 5a7f1ee464e7c654 2021-03-30 18:51:48 asr2/radeonvii4 1250000033 LL 20000 0.00%; 17428 us/it; ETA 252d 03:27; 1bdc8fcb27f794f5 2021-03-30 18:52:26 asr2/radeonvii4 Stopping, please wait.. 2021-03-30 18:52:29 asr2/radeonvii4 1250000033 LL 22000 0.00%; 20088 us/it; ETA 290d 14:53; b2fc2c1ace615898 2021-03-30 18:52:29 asr2/radeonvii4 waiting for the Jacobi check to finish.. 2021-03-30 19:11:16 asr2/radeonvii4 1250000033 OK 22000 (jacobi == -1) Code:
INFO: Maximum recommended exponent for FFT length (73728 Kdbl) = 1315825018; p[ = 1250000033]/pmax_rec = 0.9499743628. Initial DWT-multipliers chain length = [long] in carry step. M1250000033: using FFT length 73728K = 75497472 8-byte floats, initial residue shift count = 354702250 this gives an average 16.556846208042568 bits per digit Using complex FFT radices 36 16 16 16 16 16 mers_mod_square: radix0/2 not exactly divisible by NTHREADS - This will hurt performance. 10000 iterations of M1250000033 with FFT length 75497472 = 73728 K, final residue shift count = 1034108523 Res64: 5A7F1EE464E7C654. AvgMaxErr = 0.085587384. MaxErr = 0.113281250. Program: E20.1.1 CUDALucas on GTX1080Ti failed to match residues in 4 of 4 attempts by batch file, and rapidly crashed. Code:
CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:02:54 ... Using threads: square 32, splice 1024. Starting M1250000033 fft length = 73728K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 08:34:52 | M1250000033 10000 0x5b7eb1ec98fce96f | 73728K 0.12305 51.9534 519.53s | 751:15:14:39 0.00% | (program crash & auto restart by batch file) Using threads: square 32, splice 1024. Starting M1250000033 fft length = 73728K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 08:52:32 | M1250000033 10000 0x904cf0517c82f343 | 73728K 0.11816 52.2756 522.75s | 756:07:06:22 0.00% | | Nov 10 09:01:15 | M1250000033 20000 0x10231dfa9495f5c3 | 73728K 0.12500 52.3162 523.16s | 756:14:01:09 0.00% | (program crash & auto restart by batch file) Using threads: square 32, splice 1024. Starting M1250000033 fft length = 73728K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 09:19:48 | M1250000033 10000 0xffffffffffffffff | 73728K 0.11914 52.1692 521.69s | 754:18:11:06 0.00% | | Nov 10 09:28:32 | M1250000033 20000 0x0000000000000000 | 73728K 0.11719 52.4310 524.31s | 756:15:29:04 0.00% | Illegal residue: 0x0000000000000000. See mersenneforum.org for help. (program crash & auto restart by batch file) Using threads: square 32, splice 1024. Starting M1250000033 fft length = 73728K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 09:47:19 | M1250000033 10000 0x1ac66811a2df6b09 | 73728K 0.12500 51.9302 519.30s | 751:07:12:15 0.00% | | Nov 10 09:56:03 | M1250000033 20000 0xa82b4d1e98e888a5 | 73728K 0.12109 52.3941 523.94s | 754:15:35:19 0.00% | (batch file gives up after 4 attempts) Code:
[2022-05-09 20:09:55] M1250000033 Iter# = 100000 [ 0.01% complete] clocks = 08:41:50.153 [313.1015 msec/iter] Res64: 95C8FAB6227CB3AE. AvgMaxErr = 0.128896556. MaxErr = 0.187500000. Residue shift count = 0. These were performed after the 2Gbit, 1.5Gbit, and 1.25Gbit experiments, to determine the approximate transition point for CUDALucas. The following are in exponent order, not chronological order. Note that after the initial experimentation on GTX1080, running PRP/GEC in gpuowl on the GTX1080 GPU showed a significant error rate. So the experiments may reflect that GPU's unreliability more than CUDALucas reliability. Retesting CUDALucas is being performed on a different GPU. 1.40G (CUDALucas on GTX1080, system ram does not have ECC) Code:
Using threads: square 1024, splice 32. Starting M1400000197 fft length = 81920K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Mar 27 09:32:44 | M1400000197 10000 0x3064425057775e29 | 81920K 0.17188 85.1755 851.75s | 1380:03:35:56 0.00% | exit at Sat 03/27/2021 9:47:05.96 Code:
Starting M1400000197 fft length = 81920K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 09 12:23:40 | M1400000197 10000 0xb98557f593be218e | 81920K 0.16406 59.9727 599.72s | 971:18:34:45 0.00% | | Nov 09 12:33:49 | M1400000197 20000 0xb391b97d3409a2b5 | 81920K 0.17188 60.8700 608.70s | 979:00:53:16 0.00% | Code:
INFO: Maximum recommended exponent for FFT length (81920 Kdbl) = 1458483632; p[ = 1400000197]/pmax_rec = 0.9599012058. Initial DWT-multipliers chain length = [long] in carry step. M1400000197: using FFT length 81920K = 83886080 8-byte floats, initial residue shift count = 1225430952 this gives an average 16.689302885532378 bits per digit Using complex FFT radices 320 16 16 16 32 Using 8 threads in carry step 10000 iterations of M1400000197 with FFT length 83886080 = 81920 K, final residue shift count = 1261367293 Res64: 23BEDBCBD2F66C97. AvgMaxErr = 0.135606808. MaxErr = 0.187500000. Program: E20.1.1 Code:
2021-11-09 12:49:24 gpuowl v6.11-380-g79ea0cc 2021-11-09 12:49:24 config: -device 3 -user kriesel -cpu asr2/radeonii3 -block 1000 -log 10000 -use NO_ASM -proof 9 2021-11-09 12:49:24 device 3, unique id '' 2021-11-09 12:49:24 asr2/radeonii3 1400000197 FFT: 80M 4K:10:1K (16.69 bpw) 2021-11-09 12:49:24 asr2/radeonii3 Expected maximum carry32: 7E760000 2021-11-09 12:49:37 asr2/radeonii3 OpenCL args "-DEXP=1400000197u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DWEIGHT_STEP_MINUS_1=0xf.6130165dfeb5p-6 -DIWEIGHT_STEP_MINUS_1=-0xc.665dab2c7ba2p-6 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-11-09 12:49:42 asr2/radeonii3 OpenCL compilation in 5.08 s 2021-11-09 12:49:49 asr2/radeonii3 1400000197 LL 0 loaded: 0000000000000004 2021-11-09 12:53:13 asr2/radeonii3 1400000197 LL 10000 0.00%; 20439 us/it; ETA 331d 04:27; 23bedbcbd2f66c97 2021-11-09 12:56:35 asr2/radeonii3 Stopping, please wait.. 2021-11-09 12:56:38 asr2/radeonii3 1400000197 LL 20000 0.00%; 20449 us/it; ETA 331d 08:18; 75f7dcd66b5c154a 2021-11-09 12:56:38 asr2/radeonii3 waiting for the Jacobi check to finish.. 2021-11-09 13:18:18 asr2/radeonii3 Bye
1.43G is about the highest that would run in CUDALucas on the GTX1080 long enough to produce any printed interim residues; 1.45Gbit and higher failed early. Code:
Starting M1430000027 fft length = 81920K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Mar 27 11:28:40 | M1430000027 10000 0xa58244f8b0e73cde | 81920K 0.26563 85.7222 857.22s | 1418:18:31:51 0.00% | | Mar 27 11:42:59 | M1430000027 20000 0x210de9223d1c0ca0 | 81920K 0.29688 85.9697 859.69s | 1420:19:27:07 0.00% | | Mar 27 11:57:19 | M1430000027 30000 0xf17cde78a6256ad9 | 81920K 0.28125 85.9585 859.58s | 1421:10:07:19 0.00% | | Mar 27 12:11:38 | M1430000027 40000 0x6863e2b56e6e5d89 | 81920K 0.28125 85.9554 859.55s | 1421:17:01:24 0.00% | exit at Sat 03/27/2021 12:26:00.77 Code:
2021-11-09 13:30:05 gpuowl v6.11-380-g79ea0cc 2021-11-09 13:30:05 config: -device 3 -user kriesel -cpu asr2/radeonii3 -block 1000 -log 10000 -use NO_ASM -proof 9 2021-11-09 13:30:05 device 3, unique id '' 2021-11-09 13:30:05 asr2/radeonii3 worktodo.txt line ignored: ";DoubleCheck=1400000197" 2021-11-09 13:30:05 asr2/radeonii3 1430000027 FFT: 80M 4K:10:1K (17.05 bpw) 2021-11-09 13:30:05 asr2/radeonii3 Expected maximum carry32: A20A0000 2021-11-09 13:30:17 asr2/radeonii3 OpenCL args "-DEXP=1430000027u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xe.f9d0543ab1a5p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.7892905b96ebp-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-11-09 13:30:22 asr2/radeonii3 OpenCL compilation in 5.13 s 2021-11-09 13:30:29 asr2/radeonii3 1430000027 LL 0 loaded: 0000000000000004 2021-11-09 13:33:50 asr2/radeonii3 1430000027 LL 10000 0.00%; 20141 us/it; ETA 333d 08:23; 19b5110e4fd08ef6 Code:
INFO: Maximum recommended exponent for FFT length (81920 Kdbl) = 1458483632; p[ = 1430000027]/pmax_rec = 0.9804703979. Initial DWT-multipliers chain length = [short] in carry step. M1430000027: using FFT length 81920K = 83886080 8-byte floats, initial residue shift count = 492342517 this gives an average 17.046928727626799 bits per digit Using complex FFT radices 320 16 16 16 32 Using 8 threads in carry step 10000 iterations of M1430000027 with FFT length 83886080 = 81920 K, final residue shift count = 891887602 Res64: 19B5110E4FD08EF6. AvgMaxErr = 0.217721720. MaxErr = 0.281250000. Program: E20.1.1 Code:
Using threads: square 1024, splice 32. Starting M1430000027 fft length = 81920K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 09 12:56:20 | M1430000027 10000 0xffffffffffffffff | 81920K 0.28125 60.7330 607.33s | 1005:04:21:07 0.00% | | Nov 09 13:06:27 | M1430000027 20000 0xffffffffffffffff | 81920K 0.28125 60.7226 607.22s | 1005:02:06:31 0.00% | | Nov 09 13:16:35 | M1430000027 30000 0x24fc53761b0533fc | 81920K 0.28125 60.7770 607.77s | 1005:08:27:36 0.00% | Code:
Starting M1430000027 fft length = 81920K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 09 13:33:15 | M1430000027 10000 0x050701ca0c98d890 | 81920K 0.28125 60.5683 605.68s | 1002:10:56:38 0.00% | An independent try on the GTX1080Ti in gpuowl produced a matching res64 Code:
2021-11-09 14:25:34 gpuowl v6.11-380-g79ea0cc 2021-11-09 14:25:34 config: -device 0 -user kriesel -cpu test/GTX1080Ti -maxAlloc 7500 -proof 9 -use NO_ASM -log 10000 -yield 2021-11-09 14:25:34 device 0, unique id '' 2021-11-09 14:25:34 test/GTX1080Ti 1430000027 FFT: 80M 4K:10:1K (17.05 bpw) 2021-11-09 14:25:34 test/GTX1080Ti Expected maximum carry32: A20A0000 2021-11-09 14:25:46 test/GTX1080Ti OpenCL args "-DEXP=1430000027u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DPM1=0 -DCARRY64=1 -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xe.f9d0543ab1a5p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.7892905b96ebp-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-11-09 14:25:46 test/GTX1080Ti 2021-11-09 14:25:46 test/GTX1080Ti OpenCL compilation in 0.02 s 2021-11-09 14:25:49 test/GTX1080Ti 1430000027 LL 0 loaded: 0000000000000004 2021-11-09 14:35:02 test/GTX1080Ti 1430000027 LL 10000 0.00%; 55245 us/it; ETA 914d 08:19; 19b5110e4fd08ef6 At 1.44G CUDALucas struggled and crashed on the GTX1080. Code:
Using threads: square 32, splice 32. Starting M1440000083 fft length = 82944K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 82944K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 32, splice 32. Starting M1440000083 fft length = 82944K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 82944K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 32, splice 128. Starting M1440000083 fft length = 84672K Code:
Using threads: square 32, splice 128. Starting M1440000083 fft length = 82944K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 82944K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 32, splice 128. Starting M1440000083 fft length = 82944K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 82944K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 32, splice 1024. Starting M1440000083 fft length = 84672K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 09 22:14:53 | M1440000083 10000 0x58961e35a62cf0a1 | 84672K 0.14063 66.4872 664.87s | 1108:02:43:39 0.00% | Resettng fft. Code:
2021-11-11 04:29:06 gpuowl v6.11-380-g79ea0cc 2021-11-11 04:29:06 config: -device 0 -user kriesel -cpu test/GTX1080Ti -maxAlloc 7500 -proof 9 -use NO_ASM -log 10000 -yield 2021-11-11 04:29:06 device 0, unique id '' 2021-11-11 04:29:06 test/GTX1080Ti 1440000083 FFT: 80M 4K:10:1K (17.17 bpw) 2021-11-11 04:29:06 test/GTX1080Ti Expected maximum carry32: AFFF0000 2021-11-11 04:29:16 test/GTX1080Ti OpenCL args "-DEXP=1440000083u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DPM1=0 -DCARRY64=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xc.84e9e985fb5f8p-4 -DIWEIGHT_STEP_MINUS_1=-0xe.0c13e5b57c4bp-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-11-11 04:29:22 test/GTX1080Ti 2021-11-11 04:29:22 test/GTX1080Ti OpenCL compilation in 6.22 s 2021-11-11 04:29:24 test/GTX1080Ti 1440000083 LL 0 loaded: 0000000000000004 2021-11-11 04:38:43 test/GTX1080Ti 1440000083 LL 10000 0.00%; 55823 us/it; ETA 930d 08:55; 757f1958eea39b48 2021-11-11 04:48:02 test/GTX1080Ti 1440000083 LL 20000 0.00%; 55897 us/it; ETA 931d 14:24; ca704e1ad71c4d7f ... 2021-11-11 05:15:59 test/GTX1080Ti 1440000083 LL 50000 0.00%; 55910 us/it; ETA 931d 19:07; cbe95431712b43dc 2021-11-11 05:20:37 test/GTX1080Ti Stopping, please wait.. 2021-11-11 05:20:39 test/GTX1080Ti 1440000083 LL 55000 0.00%; 56128 us/it; ETA 935d 10:25; dcca548706633818 2021-11-11 05:20:39 test/GTX1080Ti waiting for the Jacobi check to finish.. 2021-11-11 05:37:11 test/GTX1080Ti 1440000083 OK 55000 (jacobi == -1) Code:
[2022-05-01 20:27:50] M1440000083 Iter# = 10000 [ 0.00% complete] clocks = 00:59:38.481 [357.8482 msec/iter] Res64: 757F1958EEA39B48. AvgMaxErr = 0.216583227. MaxErr = 0.281250000. Residue shift count = 744135935. Code:
Using threads: square 32, splice 32. Starting M1450000043 fft length = 82944K CUDALucas GTX1080Ti does not produce repeatable results, and likely all the 10k iteration res64 values are wrong. Code:
CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:02:54 Using threads: square 32, splice 128. Starting M1450000043 fft length = 82944K (program crash, batch file auto restarts task) Using threads: square 32, splice 128. Starting M1450000043 fft length = 82944K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 17:59:54 | M1450000043 10000 0x23b2ba4d44537db3 | 82944K 0.28125 61.7818 617.81s | 1036:20:11:26 0.00% | | Nov 10 18:10:14 | M1450000043 20000 0xc7a4f2c40f16bbdd | 82944K 0.29688 61.9543 619.54s | 1038:06:45:33 0.00% | (program crash, batch file auto restarts task) Using threads: square 32, splice 128. Starting M1450000043 fft length = 82944K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 18:32:00 | M1450000043 10000 0x61aa46b9c8b2d0a0 | 82944K 0.28906 61.8699 618.69s | 1038:07:40:38 0.00% | | Nov 10 18:42:23 | M1450000043 20000 0x0000000000000000 | 82944K 0.31250 62.2602 622.60s | 1041:14:05:40 0.00% | Illegal residue: 0x0000000000000000. See mersenneforum.org for help. (program exit, batch file auto restarts task) Using threads: square 32, splice 128. Starting M1450000043 fft length = 82944K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Nov 10 19:02:41 | M1450000043 10000 0x4fa6a82c75016f4f | 82944K 0.29688 61.3545 613.54s | 1029:16:04:31 0.00% | (program crash, batch file exits) Code:
2022-05-01 21:51:50 test/radeonvii 1450000043 LL 10000 0.00%; 19064 us/it; ETA 319d 22:26; 654ee6a96681050a 2022-05-01 22:20:23 test/radeonvii 1450000043 LL 100000 0.01%; 18998 us/it; ETA 318d 19:32; 96656182cba4bb39 Code:
[2022-05-02 09:19:43] M1450000043 Iter# = 10000 [ 0.00% complete] clocks = 00:48:18.435 [289.8435 msec/iter] Res64: 654EE6A96681050A. AvgMaxErr = 0.254127446. MaxErr = 0.328125000. Residue shift count = 1420926634. Code:
2022-05-04 13:55:46 test/radeonvii 1450000043 OK 10000 0.00%; 18668 us/it; ETA 313d 07:07; c51ba6c28e5adfb9 (check 41.96s) 2022-05-04 13:59:34 test/radeonvii 1450000043 OK 20000 0.00%; 18631 us/it; ETA 312d 16:05; 5d83daadd9f021f4 (check 41.80s) 2022-05-04 14:10:58 test/radeonvii 1450000043 OK 50000 0.00%; 18625 us/it; ETA 312d 13:23; d9dd1f8b2addb16d (check 41.72s) 2022-05-04 14:30:00 test/radeonvii 1450000043 OK 100000 0.01%; 18657 us/it; ETA 313d 02:10; a1d5ce3bb8126dd7 (check 41.97s) Code:
[2022-05-04 21:30:49] M1450000043 Iter# = 10000 [ 0.00% complete] clocks = 00:59:44.029 [358.4030 msec/iter] Res64: C51BA6C28E5ADFB9. AvgMaxErr = 0.254497920. MaxErr = 0.343750000. Residue shift count = 0. [2022-05-04 22:30:44] M1450000043 Iter# = 20000 [ 0.00% complete] clocks = 00:59:26.868 [356.6868 msec/iter] Res64: 5D83DAADD9F021F4. AvgMaxErr = 0.255102744. MaxErr = 0.328125000. Residue shift count = 0. gpuowl v4.6 PRP on RX480 (not recommended, at estimated 7.8 years to completion; also moot since it has a known small factor) B1 bounds 0, so presumably base = 3. Code:
2018-11-01 16:35:37 condorella-rx480 10000/1500000041 [ 0.00%], 164.51 ms/it [163.50, 174.58]; ETA 2856d 01:42; CUDALucas v2.06 on a GTX1080 failed on 1,500,000,043 with an error similar to that for 2Gbit (see below). Code:
Using threads: square 512, splice 128. Starting M1500000043 fft length = 86400K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 86400K. Something is wrong! Quitting. Code:
2022-05-04 14:56:40 test/radeonvii 1500000043 OK 10000 0.00%; 20466 us/it; ETA 355d 07:35; d77ad2df221f4174 (check 45.60s) 2022-05-04 15:00:50 test/radeonvii 1500000043 OK 20000 0.00%; 20483 us/it; ETA 355d 14:34; d72694cae693cb89 (check 45.70s) ... 2022-05-04 15:13:22 test/radeonvii 1500000043 OK 50000 0.00%; 20480 us/it; ETA 355d 12:56; cac725527de70889 (check 45.81s) ... 2022-05-04 15:34:15 test/radeonvii 1500000043 OK 100000 0.01%; 20462 us/it; ETA 355d 05:11; 0ac335566f192292 (check 45.83s) Code:
[2022-05-09 14:47:17] M1500000043 Iter# = 10000 [ 0.00% complete] clocks = 02:26:33.365 [879.3366 msec/iter] Res64: D77AD2DF221F4174. AvgMaxErr = 0.000414304. MaxErr = 0.000549316. Residue shift count = 158099465. https://www.mersenne.ca/exponent/2147483563 is theoretically just within range of CUDALucas. It would need a lot of TF and P-1 before a serious attempt at LL were worthwhile. Estimated time on a GTX1080 is several years to completion, and little or no chance of correct completion without the Jacobi check or better. In initial attempts to obtain LL timing on a GTX 1080, CUDALucas halted repeatedly before producing any timing data. A few more attempts were made after tuning the application for the GPU on the system where it's currently installed. It initially selects a properly sized fft length, decides that's too large, goes to one too small, and has excessive round-off error, terminating, even if a proper fft length is specified on the command line. Code:
The fft length 131072K is too large for exponent 2147483563, decreasing to 116640K Using threads: square 1024, splice 64. Starting M2147483563 fft length = 116640K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 116640K. Something is wrong! Quitting. Code:
Device GeForce GTX 1080 Compatibility 6.1 clockRate (MHz) 1797 memClockRate (MHz) 5005 fft max exp ms/iter 76832 1335757897 83.2664 81920 1422251777 83.9003 82944 1439645131 84.9825 84672 1468986017 91.3988 86400 1498314007 94.8596 93312 1615502269 96.7309 96768 1674025489 99.3981 98304 1700021251 103.3561 100352 1734668777 105.2496 102400 1769301077 110.3034 104976 1812840839 112.3060 110592 1907684153 113.6384 114688 1976791967 117.1181 115200 1985426669 124.0529 116640 2009707367 131.0044 131072 2147483647 131.6075 The issue was reproducible on 2,000,000,099 and some lower values. Code:
Using threads: square 1024, splice 64. Starting M2000000099 fft length = 115200K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 115200K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 1024, splice 64. Starting M2000000099 fft length = 115200K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 115200K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 1024, splice 64. Starting M2000000099 fft length = 116640K Round off error at iteration = 100, err = 0.5 > 0.35, fft = 116640K. Something is wrong! Quitting. Gpuowl V6.11-380 on a Radeon VII (subsequent Jacobi symbol check successful): Code:
2022-05-02 09:12:03 test/radeonvii 2000000099 LL 10000 0.00%; 28213 us/it; ETA 653d 02:00; f7b1319c033334c2 Code:
2021-03-26 13:57:40 asr2/radeonvii4 2147483563 P1 B1=11000000, B2=600000000; 15869712 bits; starting at 0 2021-03-26 14:02:48 asr2/radeonvii4 saved 2021-03-26 14:03:21 asr2/radeonvii4 2147483563 P1 10000 0.06%; 34113 us/it; ETA 6d 06:17; d78cbf554970d8c1 Gpuowl V6.11-380 (subsequent Jacobi symbol check successful): Code:
2022-05-02 10:31:42 test/radeonvii 2147483563 LL 10000 0.00%; 30706 us/it; ETA 763d 04:33; b2a28af43a012b86 2022-05-02 11:17:38 test/radeonvii 2147483563 LL 100000 0.00%; 30615 us/it; ETA 760d 21:28; 1605bbbbc380ca76 2022-05-02 12:08:42 test/radeonvii 2147483563 LL 200000 0.01%; 30662 us/it; ETA 762d 00:57; 4b360c36788550fc Code:
[2022-05-04 20:37:34] M2147483563 Iter# = 10000 [ 0.00% complete] clocks = 01:13:43.354 [442.3354 msec/iter] Res64: B2A28AF43A012B86. AvgMaxErr = 0.234474211. MaxErr = 0.296875000. Residue shift count = 880753044. Code:
2021-03-26 12:43:44 asr2/radeonvii4 2147483743 OK 20000 0.00%; 38636 us/it; ETA 960d 07:01; 304a08968e896b20 (check 23.14s) 2021-03-26 13:36:46 asr2/radeonvii4 2147483743 OK 100000 0.00%; 38636 us/it; ETA 960d 05:56; 0637bf5d33611a9d (check 22.85s) Code:
[2022-05-01 22:01:47] M2147483743 Iter# = 10000 [ 0.00% complete] clocks = 01:11:35.048 [429.5048 msec/iter] Res64: FDB3067B242478FA. AvgMaxErr = 0.234596052. MaxErr = 0.312500000. Residue shift count = 1753623453. [2022-05-01 23:14:04] M2147483743 Iter# = 20000 [ 0.00% complete] clocks = 01:11:42.295 [430.2295 msec/iter] Res64: 304A08968E896B20. AvgMaxErr = 0.235129663. MaxErr = 0.312500000. Residue shift count = 674700330. Code:
[2022-05-01 21:58:16] M2147483743 Iter# = 10000 [ 0.00% complete] clocks = 01:18:25.244 [470.5244 msec/iter] Res64: 94A87DFAE884EB75. AvgMaxErr = 0.234346936. MaxErr = 0.312500000. Residue shift count = 1552261820. [2022-05-02 09:45:49] M2147483743 Iter# = 100000 [ 0.00% complete] clocks = 01:18:24.558 [470.4559 msec/iter] Res64: 3A997CAD52805EF0. AvgMaxErr = 0.234872706. MaxErr = 0.296875000. Residue shift count = 1308916996. There may be some naive interest in attempting these exponents, since they would potentially qualify for the largest EFF prize. Sufficient TF is feasible with mfaktx and a fast GPU or a lot of patience. Some exponents have been trial factored to ~sufficient bit depth and others are in progress. There was until Mlucas ~v20.1 no GIMPS software suitable for P-1 factoring gigadigit Mersenne numbers preparatory to a primality test attempt. As an experiment a copy of Gpuowl v6.11-219 was modified to raise the P-1 exponent limit as a function of fft length, sufficiently to permit P-1 attempts on gigabit exponents with its highest fft length. It was tested at a much smaller exponent and fft length, near the expanded P-1 bits/word limit, and failed to find a known factor, as one might expect. (George had identified a factor of 3 as a reason for slightly lower usable bits/word in the fft lengths for P-1 than for PRP or LL.) P-1 factoring of gigadigit exponents will require a larger fft length than the 192M largest ever offered in Gpuowl. Estimated runtime for a gigadigit stage 1 P-1 run on a Radeon VII would be ~1-2 months per exponent. That is impractically long without robust error correction such as introduced in Gpuowl V7.2 P-1. Gpuowl V7.2 P-1 stage 2 requires a minimum of 24 buffers, increasing the GPU memory requirement considerably. I estimate ~40GiB GPU ram would be required. Mlucas V20 or later and a select few versions of gpuowl could nominally run the primality tests. (CUDALucas, mprime/prime95, cllucas cannot.) Primality test duration is too long on any available consumer-grade hardware. There are severe reliability issues with very long LL tests. PRP/GEC would be required. PRP proof generation would be strongly recommended. No software existing now (2022-05-02) offers the necessary combination of large enough fft lengths, PRP, GEC, and PRP proof generation. Gpuowl v6.5-84 PRP3 (not recommended at an estimated 6.3 years on a Radeon VII; lacks PRP proof capability) M3,321,928,097 has 1,000,000,001 decimal digits (This exponent is moot, as it has multiple known small factors) Code:
FFT 196608K: Width 512x8, Height 256x8, Middle 12; 16.50 bits/word 2020-05-24 17:21:43 radeonvii3 3,321,928,097 20000 0.00%; 60053 us/sq; ETA 2308d 21:54; 1a05f0ca51fb8e7a 2020-05-24 18:41:56 radeonvii3 3,321,928,097 100000 0.00%; 60122 us/sq; ETA 2311d 12:24; 5b2cb77f57840bcc A Gpuowl PRP attempt on a Radeon VII GPU yielded Code:
2021-11-12 13:37:12 gpuowl v6.5-84-g30c0508 2021-11-12 13:37:12 Note: no config.txt file found 2021-11-12 13:37:12 radeonvii3 config: -d 3 -user kriesel -cpu radeonvii3 -use NO_ASM -log 10000 2021-11-12 13:37:12 radeonvii3 3321928171 FFT 196608K: Width 512x8, Height 256x8, Middle 12; 16.50 bits/word 2021-11-12 13:37:12 radeonvii3 using long carry kernels 2021-11-12 13:37:13 radeonvii3 OpenCL args "-DEXP=3321928171u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=12u -DWEIGHT_STEP=0xb.4fea9eebaf1ep-3 -DIWEIGHT_STEP=0xb.50b3cb11d398p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DNO_ASM=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2021-11-12 13:37:17 radeonvii3 OpenCL compilation in 4096 ms 2021-11-12 13:37:57 radeonvii3 3321928171.owl not found, starting from the beginning. 2021-11-12 13:42:02 radeonvii3 3321928171 OK 2000 0.00%; 54123 us/sq; ETA 2080d 21:57; 56782571892f7722 (check 65.70s) 2021-11-12 13:49:18 radeonvii3 3321928171 10000 0.00%; 54518 us/sq; ETA 2096d 03:00; 2165debfa35a4d4f Mlucas v20.1.1 2022-03-20 PRP: Code:
[2022-05-04 17:01:24] M3321928171 Iter# = 10000 [ 0.00% complete] clocks = 02:08:58.093 [773.8093 msec/iter] Res64: 2165DEBFA35A4D4F. AvgMaxErr = 0.141962889. MaxErr = 0.187500000. Residue shift count = 1082059918. [2022-05-04 19:04:51] M3321928171 Iter# = 20000 [ 0.00% complete] clocks = 02:02:29.438 [734.9438 msec/iter] Res64: 9471341ED3F3D370. AvgMaxErr = 0.142553408. MaxErr = 0.187500000. Residue shift count = 106571623. Mlucas v20.1.1 2021-12-02 LL unverified interim residues. Both timing lines below were obtained in the same run on an i7-1165G7 laptop with 16 GiB nonECC ram, Ubuntu/WSL/Win10; first line prime95 stopped, second line prime95 also running. Note, that Mlucas version explicitly does not support full iteration count computations exceeding 1M on exponents >~2^{32} for LL, PRP, or Pepin. Run times to completion would be enormous on nearly all available computing hardware. The "faster" 3.62 second/iteration timing below corresponds to ~1,026. years to completion. Code:
[2022-02-07 20:36:50] M8937021911 Iter# = 1000 [ 0.10% complete] clocks = 01:00:20.155 [3620.1556 msec/iter] Res64: 97D908FE3A52408B. AvgMaxErr = 0.274295593. MaxErr = 0.375000000. Residue shift count = 0. [2022-02-08 10:19:21] M8937021911 Iter# = 10000 [ 1.00% complete] clocks = 01:37:03.718 [5823.7187 msec/iter] Res64: 2190CD0E45AEF927. AvgMaxErr = 0.283365601. MaxErr = 0.343750000. Residue shift count = 0. Both the interim results preceding, and additional subsequent combinations not tabulated above, due to the length of this post, are summarized in the last attachment. Top of this thread https://www.mersenneforum.org/showthread.php?t=24003 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-05-23 at 15:15 Reason: added mlucas & gpuowl interim results |
2021-04-17, 17:07 | #15 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}·1,777 Posts |
PRP proof file format
V1:
"An attempt at documenting the file format is here: https://github.com/preda/gpuowl/wiki/PRP-Proof-File-Spec" (reproduced from https://mersenneforum.org/showpost.p...6&postcount=87) (following first appeared as part of https://mersenneforum.org/showpost.p...8&postcount=95; edited for correctness and completeness here) Header fields are expressed in 8-bit ASCII, 1 character per byte. Some header records are variable in length. All header fields shown in the example are required. Format and order must be as described. Except for the first header line, the structure is descriptor, "=", value, newline character (0x0A). Numerical values are expressed as unsigned base-ten integers in ASCII. <variable> below means insert value of variable. Alpha characters in header descriptors shall be upper case. First header record is file type: PRP PROOF\n Second header record is version: VERSION=<version>\n Third header record is hashsize: HASHSIZE=<hashsize>\n Fourth header record is power: POWER=<power>\n Fifth header record is number: NUMBER=<numbertype><exponent>\n Only hashsize=64 is currently supported. Proof power 6, 7, 8, 9 or 10 are allowed values. Power 8 is preferred. Only Mersenne type numbers are currently supported, denoted by numbertype="M" Following the header's last \n, residues B and Middle(0), Middle(1),...,Middle(power-2),Middle(power-1) follow consecutively, in that order, without separator fields. Each residue is expressed as an unsigned binary multibyte integer, in least-significant-byte-first byte order, in a whole number of bytes, and its MSB is zero-extended. Byte count per residue is not present in the header, and is computed from the NUMBER field of the header. The residues' first byte position offsets are computable from header length, exponent, and which residue is sought. Abridged example: Code:
PRP PROOF VERSION=1 HASHSIZE=64 POWER=9 NUMBER=M1257787 A wider range of proof powers is allowed. (Gpuowl v7.2-53 accepts proof powers 1-10. Gpuowl v6.11-382, v7.2-93, and v7.2-112 that I have compiled and posted accept proof powers 1 to 12. Mprime/prime95 v30.2? or higher supports powers 5-12, and multiples, e.g. 7x2 to simulate power 8 with a bigger upload but less disk space requirement than 8x1; as I understand it that's a power 7 proof for each half of the PRP run.) Powers lower than 8 should only be used when necessary, such as on small computing devices such as Raspberry Pi, Intel compute sticks, and other systems with limited space for temporary files, or to make possible the proof generation of a run begun without proof generation. Computing-effort optimal (test, proof generation, server and certification combined) is around power 10 and exponent-dependent. In general, use the lesser of optimal power and maximum practical proof power, to save total computation time. Abridged example: Code:
PRP PROOF VERSION=2 HASHSIZE=64 POWER=4 NUMBER=M859433 Code:
PRP PROOF VERSION=2 HASHSIZE=64 POWER=11 NUMBER=M1168999969 Top of this thread https://www.mersenneforum.org/showthread.php?t=24003 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2022-11-27 at 19:37 Reason: added prime95 limit optimal power proof file size example, added gpuowl power limits |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Windows 10 SP1 will have UBUNTU developer support !!!! | tServo | Software | 19 | 2016-04-23 21:30 |
Cheesehead's Corner? | jasong | jasong | 6 | 2013-10-16 20:09 |
Intel Xeon Phi - Knights Corner | BotXXX | Hardware | 16 | 2012-06-21 23:54 |
Debian developer needed... | Xyzzy | Linux | 5 | 2006-06-01 14:56 |