View Single Post
Old 2021-08-31, 22:14   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

11,689 Posts
Default Mlucas v20.1 available

This is an Update-release of v20, but with enough changes as to warrant a minor-version number increment. As always, download via the README page.

*** I urge users to delete (or rename) the mlucas.cfg file they are using for runs and run the self-tests using the v20.1 build to generate a fresh one, due to the v20 suboptimal-radix-set selection issue mentioned in the list below. ***

Changes include:
  • The help menu has been scrapped in favor of a help.txt file in the same top-level directory as makemake.sh and primenet.py.
  • Algorithmic improvements which yield a 10-20% faster p-1 stage 2. In my p-1 runs using the initial v20 release, the ratio between time-per-modmul in stage 2 vs stage 1 was in the 1.35-1.4 range. (We expect stage 2 modmuls to be somehwat slower than stage 1 because they FFT-convolve pairs of distinct inputs whereas stage 1 does auto-convolutions of a single input, but 1.4x is rather on the large side). The improved code yields a timing ratio in the 1.15-1.2 range.
  • A bug in the stage 2 "number of buffers available based on current RAM allocation" was allowing the difference of that value and the number of auxiliary-computation stage 2 buffers of 5 to drop below (signed int)0, which yielded nonsense when the result was stored in its target unsiged-int variable. (This led to the stage 2 code to try to allocate some 4-billion-plus number of buffers, resulting in an unable-to-alloc error-exit.) That is now fixed. Also, said number-buffers-available computation is now being done at the start of each stage 2, rather than just once at run-start.
  • A new command-line option '-pm1_s2_nbuf' allowing users to override the above runtime-auto-computation and directly set an upper bound on the number of stage 2 memory buffers used. The constraints on this are detailed in the help.txt file. For stage 2 restarts there is an added constraint related to small-prime relocation, namely that if stage 2 was begun with a multiple of 24 or 40 buffers, the restart-value must also be a multiple of the same base-count, 24 or 40. Said constraint will be automatically enforced. If the resulting buffer count exhausts available memory, performance will suffer due to system memory-swapping, thus this flag should only be invoked by uesrs who know what they are doing.
  • A fix for 2 bugs brought to my attention by Ken Kriesel:
    1. A suboptimal-radix-set selection bug in the self-testing;
    2. For p-1 factor-found cases, the JSON output written to results.txt was not wrapping the factor (currently there will be at most 1 factor printed, which in rare cases will be the product of 2 prime factors) in double-quotes, which was causing submission of the result via the online manual result-reporting page at mersenne.org to fail. As best I can tell, automated submissions using either the primenet.py script which ships with the Mlucas v20 release or the Dulcet/Connelly enhanced primenet.py script should be fine with or without the quotes, but users are encouraged to upgrade to v20.1 to gain the benefit of the faster stage 2.
  • A fix for a missing null-string-terminator bug in the p-1 assignment-splitting code brought to my attention by tdulcet, which was leading to the Test/PRP one of the resulting assignment pair to contain whatever chars the string buffer in question happened to be holding beyond the (missing) end of the Test/PRP assignment.
  • Reference-residues for 128-240M were incorrect, due to a hidden assumption in once piece of the residue-shift-handling code (which figures out where to inject the -2 of each LL-test iteration into the circularly-shifted residue) which amounted to assuming p < 231.
  • v20.1 raises the largest Mersenne number testable to match the longstanding Fermat-number limit, set by the maximum supported FFT length of 512M. (Note that exponents > 232, thus FFT lengths 256-512M, require '-shift 0' to run.) In practice, this translates to M(p) with p approaching 9 billion. Clearly, full-length primality tests of numbers this large are nowhere near practicable as of this writing, but such moduli can be useful for software and hardware parallel-scaling tests.
  • Miscellaneous additional minor bug- and pretty-print fixes.
As always, please subscribe to this thread (and unsubscribe from any older Mlucas-release threads) to be notified of any bug and patch reports.

Last fiddled with by ewmayer on 2021-09-02 at 20:38 Reason: primenet.org -> mersenne.org
ewmayer is offline   Reply With Quote