View Single Post
Old 2021-09-17, 23:18   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10110101111002 Posts
Default Bug list

This is a partial list, mostly by version in which they were first seen. Testing has only involved Mersenne number related capabilities. NO attempt was made at testing on Fermat number capabilities.

V17.0

gave msec/iter times, but labeled sec/iter. Resolved in later version

V18.0
?

V19.0
?

V19.1
?

V20.0
There are several described at https://mersenneforum.org/showpost.p...47&postcount=1.
Upgrade to V20.1 for faster P-1 stage 2 and multiple bug fixes.
And at least one that slipped by brief testing, so are present in V20.1 also. See also P-1 stage 2 restart issue etc below.

V20.1
  1. mislabels n-bit P-1 factor found as n (base ten) digits.
    Code:
    Found 70-digit factor in Stage 2: 646560662529991467527
    Following examples from V20.0
    Code:
    Found 95-digit factor in Stage 1: 33287662948300610984694812407
    Found 84-digit factor in Stage 2: 15299475858498328182948679
    Log10(33,287,662,948,300,610,984,694,812,407)=28.52...; Log10(33,287,662,948,300,610,984,694,812,407)/Log10(2) = 94.74...
    Appears to be corrected in a subsequent update under development.
  2. When there is a restart in P-1 stage 2 (Mlucas intended stop/restart, or Windows Update or power failure pulls the rug out from under Linux/WSL and Mlucas), the following result record for P-1 stopped/restarted in stage 2 has 1970-01-01 midnight as time stamp, instead of the actual completion time. <exponent>.stat file entries are ok. The P-1 stage 2 restart code path bypasses the usual inits of calendar_time, which later affects the result output timestamp.
    Appears to be corrected in a subsequent update under development.
  3. More recently, also on Ubuntu/WSL/Win10, I've observed peculiar result line date values such as "4442758-11-21 10:39:25 UTC" on ~2021-10-07 after recovering from large-memory related stage 2 Mlucas crash on 10M and on 106M exponent runs.
  4. Factors found at a GCD early in stage 2 are reported as if they were found in stage 1, with only stage 1 bound given. Computing the effective stage 2 bound in such a case is not easy or clear.
  5. Factor found after a full but interrupted stage 2 was indicated as stage 1 bounds only.
  6. -maxalloc with a % that equates to > ~32GiB attempted usage results in a segmentation fault at the beginning of P-1 stage 2. Observed on a 128 GiB ram AVX system with Win10/WSL1/Ubuntu 18.04.2 LTS combo. Ernst has been able to reproduce the issue on a KNL/Ubuntu system. Some variables that were typed uint32 will need to be uint64. Until resolved, a workaround is to use less of the available ram, at some loss of speed. Appears to be corrected in a subsequent update under development.
  7. In P-1 at least, some values that ought be recalculated for each worktodo item appear to be reused unchanged instead. Number of buffers in stage 2 is the first example seen, not recalculated from 106M to 334M. Another is FFT length did not get updated from a 1M P-1 task to a 3M P-1 task immediately following. Possible workarounds include sorting and segregating assignments to similar exponents, or use of the command line and scripting for separate program sessions for disparate exponents. B2start is another variable that gets carried over.
  8. For P-1 and probably other work types, on small exponents on which a stage of computation may complete faster than the checkpoint save interval or stat file update interval, no stage timing is saved to the stat file or displayed on stdout/stderr. This means run time scaling measurements on small exponents can not be made, except with a stopwatch or the batch file/shell script equivalent.
  9. In self test on enormous fft lengths (256M - 512M) on 2 models of AVX512 CPUs on Ubuntu/WSL/Win10, one of the 512M radix sets reproducibly produces a segfault crash, preventing production of a line to finish the self test. On Ernst's attempt to reproduce on bare Ubuntu on AVX512, and perhaps later version of source code, there's instead an excessive roundoff error flagged. A workaround is to hand edit the mlucas.cfg file based on console output from other radix sets completed prior to the crash. Since the enormous class fft lengths begin at 256M, there's little or no need for them currently in the Mersenne number realm. FFT length 192M is expected to be sufficient for P-1 factoring attempts on OBD candidates.
  10. for worktodo entry:
    PMinus1=00000000000000000000000000000000,1,2,3000077,-1,8000,1200000
    ...
    Product of Stage 1 prime powers with b1 = 8000 is 11649 bits (183 limbs), vs estimated 12035. Setting PRP_BASE = 3.
    ERROR: at line 1165 of file ../src/mi64.c
    Assertion failed: mi64_shl: zero-length array or shift count >= 64!
    Code needs modification for the case where it incorrectly generates a shift count of 64.
  11. a lingering bug related to relocation-prime handling in P-1 stage 2 restart

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-10-22 at 20:37
kriesel is online now