mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-06-09, 13:21   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11111100110112 Posts
Default Mlucas-specific reference thread

This thread is intended to hold only reference material specifically for Mlucas
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.)


Download and setup information for mlucas is located at http://www.mersenneforum.org/mayer/README.html
For Windows 10 or above, install WSL, a Linux distribution for WSL, add build-essential to the Linux distro, then in WSL Linux download the Mlucas source, extract the files, compile Mlucas, run self tests, etc.
For Windows 8x or below, not supported /no build method at Mlucas V20.x or earlier. Earlier versions may be compiled using msys2 and then run as native Windows (single threaded) applications on Windows 7 for example (if the OS is supported by msys2, and the cpu instruction set is supported by the Mlucas version).
For Linux, see the readme.
For MacOS or Android, see the readme.
Additional help may be found in the Mlucas subforum.


Table of contents
  1. This post
  2. Save file format as described by ewmayer https://www.mersenneforum.org/showpo...91&postcount=2
  3. Mlucas v17.1 -h help output https://www.mersenneforum.org/showpo...02&postcount=3
  4. Mlucas install script for Linux https://www.mersenneforum.org/showpo...08&postcount=4
  5. Mlucas builds for Linux (or for running on WSL on Windows) https://www.mersenneforum.org/showpo...02&postcount=5
  6. Mlucas builds for Windows https://www.mersenneforum.org/showpo...75&postcount=6
  7. V17.0 apparently normal run https://www.mersenneforum.org/showpo...22&postcount=7
  8. What it may look like when something is not working correctly https://www.mersenneforum.org/showpo...23&postcount=8
  9. Mlucas v19.0 -h help output https://www.mersenneforum.org/showpo...34&postcount=9
  10. Mlucas v19.1 -h help output https://www.mersenneforum.org/showpo...2&postcount=10
  11. Mlucas V20.0 -h help output https://www.mersenneforum.org/showpo...2&postcount=11
  12. Tuning Mlucas V20 https://www.mersenneforum.org/showpo...5&postcount=12
  13. Mlucas V20.1 timings on various hardware and environments, & prime95 compared https://www.mersenneforum.org/showpo...7&postcount=13
  14. Mlucas releases https://www.mersenneforum.org/showpo...7&postcount=14
  15. Wish list https://www.mersenneforum.org/showpo...4&postcount=15
  16. Bug list https://www.mersenneforum.org/showpo...5&postcount=16
  17. V20.1.x P-1 run time scaling https://www.mersenneforum.org/showpo...5&postcount=17
  18. Max exponent versus fft length https://www.mersenneforum.org/showpo...3&postcount=18
  19. Ram required versus exponent for P-1 stage 2 in Mlucas https://www.mersenneforum.org/showpo...8&postcount=19
  20. Optimizing core count for fastest iteration time of a single task https://www.mersenneforum.org/showpo...8&postcount=20
  21. File size scaling https://www.mersenneforum.org/showpo...8&postcount=21
  22. Older versions https://www.mersenneforum.org/showpo...9&postcount=22
  23. tbd etc

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2023-09-28 at 16:03
kriesel is online now  
Old 2018-06-09, 13:49   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·29·31 Posts
Default Save file format as described by ewmayer

As posted at http://www.mersenneforum.org/showpos...4&postcount=36 by ewmayer, except as updated in bold, first describing the V17.x format:, then planned V18 format additions:
Quote:
Here is the current Mlucas file format:

o 1 byte for test type, numerically, i.e. 256 possible values, mapped to an internal table;
o 1 byte for modulus type, currently only Mersennes and Fermats supported;
o 8 bytes for iteration count of the residue stored in the file;
o ceiling(p/8) bytes for the residue R - i.e. maximally byte-compact, endian-and-FFT-length-of-run-independent;
o 8 bytes for Res64 = R (mod 2^64), which should match the leading 8 full-residue bytes in the above bytewise form);
o 5 bytes for R (mod 2^35-1);
o 5 bytes for R (mod 2^36-1) [these last 2 a.k.a. the Selfridge-Hurwitz residues, based on those guy's Fermat-number work, using a 36-bit-hardware-integer machine; SH also used R (mod 2^36), but that is just the low 36 bits of GIMPS' Res64];

After reading R, I directly compute the two SH residues and compare to the above file-stored checksums; this gives me an md5/sha1-style integrity check of the whole residue R, which the Res64 does not.

For v18, I am adding several new fields:
o 3 bytes (was 4) for FFT-length-in-K which the code was using at time of savefile write. This is so that if the code switches to a larger-than-default FFT length mid-run based on ROE behavior for the exponent in question, it will immediately resume using the larger FFT length on restart-from-interrupt, rather than resuming using the smaller default FFT length as the current release does.
o 8 bytes for circular-shift to apply to the (unshifted) residue read from the file. I include the shift-count-at-iteration-of-savefile-write because [a] the code will choose a random shift count at run-start time (i.e. since this is not specified by the Primenet server, it cannot be read from the worktodo file), and [b] it saves the need for taking an initial-shift value s from the savefile and computing s * 2^iter (mod p). I remove the shift from R prior to the savefile write, so in fact there's really no need to store s to the file (i.e. I could resume-from-interrupt using an entirely different random shift value, applied to R after reading it from the savefile), but for aesthetic reasons I like the idea of doing the whole run based on a single initial value of s, rather than as-many-values-of-s-as-there-were-run-interrupts.
For V19, LL save file format is the same as for V18, while for PRP, per https://www.mersenneforum.org/showpo...50&postcount=6, additionally, following those fields, are:
  • full-length residue byte-array (this one holding the accumulated Gerbicz checkproduct) ceiling(p/8) bytes for the value G - i.e. maximally byte-compact, endian-and-FFT-length-of-run-independent;
  • 8 bytes for G (mod 2^64), which should match the leading 8 full-residue bytes in the above bytewise form;
  • 5 bytes for G (mod 2^35-1);
  • 5 bytes for G (mod 2^36-1)
The residue byte-arrays are least significant byte first.

I note that the exponent p itself is in the file name, not in the contents. Also note mlucas V19 PRP implementation is type 1 residues only.

Save file names are p<exponent>, q<exponent>, p<exponent>.10M, etc. For example, for M332220523, p33220523, q33220523, p33220523.10M, and eventually p33220523.20M and so on.

File sizes derived from the preceding are:
  • V17.x: 28 + ceiling(p/8) bytes
  • V18, and V19 LL: 39 + ceiling(p/8) bytes
  • V19 PRP: 57 + 2 * ceiling(p/8) bytes
Size of the V17 p332220523 file is 41527594 bytes; check. (Size allocated on disk is larger due to the disk block size.)

(No data yet on V19.1, V20, etc.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-20 at 17:10 Reason: update for V19 PRP type etc
kriesel is online now  
Old 2019-08-12, 17:07   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·29·31 Posts
Default Mlucas v17.1 -h help output

Code:
Mlucas 17.1

    http://hogranch.com/mayer/README.html

INFO: testing qfloat routines...
CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 8.2.0.
INFO: CPU supports SSE2 instruction set, but using scalar floating-point build.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: MLUCAS_PATH is set to ""
INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: testing FFT radix tables...
For the full list of command line options, run the program with the -h flag.

Mlucas command line options:

         Symbol and abbreviation key:
               <CR> :  carriage return
                |   :  separator for one-of-the-following multiple-choice menus
               []   :  encloses optional arguments
               {}   :  denotes user-supplied numerical arguments of the type noted.
                      ({int} means nonnegative integer, {+int} = positive int, {float} = float.)
          -argument :  Vertical stacking indicates argument short 'nickname' options,
          -arg      :  e.g. in this example '-arg' can be used in place of '-argument'.

         Supported arguments:

         <CR>        Default mode: looks for a worktodo.ini file in the local
                     directory; if none found, prompts for manual keyboard entry

Help submenus by topic. No additional arguments may follow the displayed ones:
 -s            Post-build self-testing for various FFT-length rnages.
 -fftlen       FFT-length setting.
 -radset       FFT radix-set specification.
 -m[ersenne]   Mersenne-number primality testing.
 -f[ermat]     Fermat-number primality testing.
 -iters        Iteration-number setting.
 -nthread|cpu  Setting threadcount and CPU core affinity.

 *** NOTE: *** The following self-test options will cause an mlucas.cfg file containing
     the optimal FFT radix set for the runlength(s) tested to be created (if one did not
     exist previously) or appended (if one did) with new timing data. Such a file-write is
     triggered by each complete set of FFT radices available at a given FFT length being
     tested, i.e. by a self-test without a user-specified -radset argument.
     (A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
     the program will use the largest permissible exponent for the given FFT length, based on
     its internal length-setting algorithm). The user must specify the number of iterations for
     the self-test via the -iters flag; while it is not required, it is strongly recommended to
     stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger
     values being preferred for multithreaded timing tests, in order to assure a decently large
     slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless
     roundoff error levels on a given compute platform are such that the default exponent at one or
     more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
        If the user lets the program set the exponent and uses one of the aforementioned standard
     self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
     resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed
     one for the given default exponent at the iteration count in question, with eligible radix sets
     consisting of those for which the roundoff error remains below an acceptable threshold.
     If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)****************
     and/or a non-default iteration number, the resulting best-timing FFT radix set will only be
     written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here *******
     This is important for tuning code parameters to your particular platform.

   FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS

 -s {...}    Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use.

 -s tiny     Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s t        This will take around 1 minute on a fast CPU..

 -s small    Runs 100-iteration self-tests on set of  24 Mersenne exponents, ranging from 173431 to 1245877
 -s s        This will take around 10 minutes on a fast CPU..

**** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ******
*                                                                          *
* -s medium   Runs set of  24 Mersenne exponents, ranging from 1327099 to 9530803
* -s m        This will take around an hour on a fast CPU.                 *
*                                                                          *
****************************************************************************

 -s large    Runs set of  24 Mersenne exponents, ranging from 10151971 to 72851621
 -s l        This will take around an hour on a fast CPU.

 -s huge     Runs set of  16 Mersenne exponents, ranging from 77597293 to 282508657
 -s h        This will take a couple of hours on a fast CPU.

 -s all      Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets.
 -s a        This will take several hours on a fast CPU.

 -fftlen {+int}   If {+int} is one of the available FFT lengths (in Kilodoubles), runs all
             all available FFT radices available at that length, unless the -radset flag is
             invoked (see below for details). If -fftlen is invoked without the -iters flag,
             it is assumed the user wishes to do a production run with a non-default FFT length,
             In this case the program requires a valid worktodo.ini-file entry with exponent
             not more than 5% larger than the default maximum for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters but without a
             user-supplied exponent, the program will do the specified number of iterations
             using the default self-test Mersenne or Fermat exponent for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters and either the
             -m or -f flag and a user-supplied exponent, the program will do the specified
             number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
             or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.

             In either of the latter 2 cases, the program will produce a cfg-file entry based
             on the timing results, assuming at least one radix set ran the specified #iters
             to completion without suffering a fatal error of some kind.
             Use this to find the optimal radix set for a single FFT length on your hardware.

             NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT-
             LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT
             MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE!

 -radset {int}    Specific index of a set of complex FFT radices to use, based on the big
             select table in the function get_fft_radices(). Requires a supported value of
             -fftlen to also be specified, as well as a value of -iters for the timing test.

 -m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
             where int must be an odd prime. If -iters is also invoked, this indicates a timing test.
             and requires suitable added arguments (-fftlen and, optionally, -radset) to be supplied.
                If the -fftlen option (and optionally -radset) is also invoked but -iters is not, the
             program first checks the first line of the worktodo.ini file to see if the assignment
             specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
             argument. If so, the -fftlen argument is treated as a user override of the default FFT
             length for the exponent. If -radset is also invoked, this is similarly treated as a user-
             specified radix set for the user-set FFT length; otherwise the program will use the cfg file
             to select the radix set to be used for the user-forced FFT length.
                If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is
             run on the user-specified Mersenne number using all sets of FFT radices available at the
             specified FFT length.
                If the -fftlen option is not invoked, the self-tests use all sets of
             FFT radices available at that exponent's default FFT length.
                Use this to find the optimal radix set for a single given Mersenne number
             exponent on your hardware, similarly to the -fftlen option.
                Performs as many iterations as specified via the -iters flag [required].

 -f {int}    Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
                If desired this can be invoked together with the -fftlen option.
             as for the Mersenne-number self-tests (see notes about the -m flag;
             note that not all FFT lengths supported for -m are available for -f).
             Optimal radix sets and timings are written to a fermat.cfg file.
                Performs as many iterations as specified via the -iters flag [required].

 -iters {int}   Do {int} self-test iterations of the type determined by the
             modulus-related options (-s/-m = Lucas-Lehmer test iterations with
             initial seed 4, -f = Pe'pin-test squarings with initial seed 3.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:34
kriesel is online now  
Old 2020-05-20, 19:07   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11111100110112 Posts
Default Mlucas install script for Linux

Haven't tried it myself, but there's a post about one at https://mersenneforum.org/showpost.p...9&postcount=34
A second version of that is at https://mersenneforum.org/showpost.php?p=569920&postcount=83; third version at https://mersenneforum.org/showpost.p...0&postcount=89


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-02-01 at 19:46
kriesel is online now  
Old 2020-11-19, 22:32   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·29·31 Posts
Default Mlucas builds for Linux (or for running on WSL on Windows)

How I built Mlucas (v19) in WSL / Ubuntu 18.04 for multiple processor types
(rename the executable between builds to identify the flavor)
Note these are mostly untested.

basic x86-64, & presumably the best bet for Knight's Corner Xeon Phi:
Code:
gcc -c -O3 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
SSE2 such as Xeon x5650, e5645, E5-26xx
Code:
gcc -c -O3 -DUSE_SSE2 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
AVX2 such as i7-7500U, i7-8750H
Code:
gcc -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
AVX-512 such as (Knights Landing MIC) Xeon Phi 7250
Code:
gcc -c -O3 -DUSE_AVX512 -march=knl -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
AVX-512 such as i5-1035G1, i7-1165G7
Code:
gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas *.o -lm -lpthread -lrt
The above are for linux multithreaded build/run environments. For Windows single-threaded end use see next post.
https://www.mersenneforum.org/mayer/README.html

Attachments are Mlucas v19 builds intended for Linux and were built on Ubuntu v18.04 running on WSL / Win10 on an i7-8750H.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: gz mlucas-avx512-knl-mt.tar.gz (1.84 MB, 258 views)
File Type: gz mlucas-avx512-skylake-mt.tar.gz (1.82 MB, 260 views)
File Type: gz mlucas-fma3-mt.tar.gz (1.86 MB, 261 views)
File Type: gz mlucas-sse2-mt.tar.gz (1.70 MB, 273 views)
File Type: gz mlucas-x86-mt.tar.gz (1.72 MB, 264 views)

Last fiddled with by kriesel on 2023-04-15 at 12:58 Reason: minor edit
kriesel is online now  
Old 2020-11-27, 15:01   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·29·31 Posts
Default Mlucas builds for Windows

Building Mlucas v19 for Windows in msys2 is similar to building for Linux or WSL, except:
remove -DUSE_THREADS and -lpthread for Windows single-threaded end use.
How I built or attempted in msys2 for Windows single-threaded environments:

SSE2 such as Xeon x5650, e5645, E5-26xx
Code:
gcc -c -O3 -DUSE_SSE2 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-sse2 *.o -lm -lrt
x86-64
Code:
gcc -c -O3 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-x86 *.o -lm -lrt
AVX2 such as i7-7500U, i7-8750H
Code:
gcc -c -O3 -DUSE_AVX2 -mavx2 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-fma3 *.o -lm -lrt
AVX512 such as i5-1035G1
Code:
gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 ../src/*.c >& build.log
grep error build.log
gcc -o Mlucas-avx512 *.o -lm -lrt
https://www.mersenneforum.org/mayer/README.html

Attachments are single-threaded Mlucas v19 builds intended for Windows 7 or higher, and were built in msys2 running on Windows 7 Pro 64-bit on a dual-Xeon-E5645 HP Z600.

(Note, because of changes in the software requirements, Mlucas v20.x no longer will build with this method. So there currently is no documented path to producing actual Windows executables for Mlucas v20.x or presumably mfactor v20.x.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: zip mlucas-x86.zip (1.73 MB, 255 views)
File Type: zip mlucas-sse2.zip (1.75 MB, 251 views)
File Type: zip mlucas-fma3.zip (1.84 MB, 262 views)

Last fiddled with by kriesel on 2023-04-15 at 12:59 Reason: minor edit
kriesel is online now  
Old 2021-01-20, 17:26   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×29×31 Posts
Default V17.0 apparently normal run

From the beginning of a .stat file (V17.0 I think):

Code:
INFO: no restart file found...starting run from scratch.
M332220523: using FFT length 18432K = 18874368 8-byte floats.
 this gives an average   17.601676676008438 bits per digit
Using complex FFT radices        36        16        16        32        32
[Jul 22 13:31:39] M332220523 Iter# = 10000 [ 0.00% complete] clocks = 02:16:04.515 [816.4516 sec/iter] Res64: 1A313D709BFA6663. AvgMaxErr = 0.171224865. MaxErr = 0.250000000.
[Jul 22 15:47:01] M332220523 Iter# = 20000 [ 0.01% complete] clocks = 02:15:19.019 [811.9019 sec/iter] Res64: 73DC7A5C8B839081. AvgMaxErr = 0.171704563. MaxErr = 0.234375000.
[Jul 22 18:02:29] M332220523 Iter# = 30000 [ 0.01% complete] clocks = 02:15:25.632 [812.5633 sec/iter] Res64: B928CD22434EEC7C. AvgMaxErr = 0.171970012. MaxErr = 0.234375000.
[Jul 22 20:17:54] M332220523 Iter# = 40000 [ 0.01% complete] clocks = 02:15:22.185 [812.2186 sec/iter] Res64: 307ECB47139AEB31. AvgMaxErr = 0.172004342. MaxErr = 0.250000000.
[Jul 22 22:33:29] M332220523 Iter# = 50000 [ 0.02% complete] clocks = 02:15:32.470 [813.2471 sec/iter] Res64: 3F64ED9E01C13B1D. AvgMaxErr = 0.171687719. MaxErr = 0.218750000.
[Jul 23 00:49:15] M332220523 Iter# = 60000 [ 0.02% complete] clocks = 02:15:43.121 [814.3121 sec/iter] Res64: B238D7DE50AFACED. AvgMaxErr = 0.171868494. MaxErr = 0.281250000.
[Jul 23 03:04:36] M332220523 Iter# = 70000 [ 0.02% complete] clocks = 02:15:18.738 [811.8738 sec/iter] Res64: 892C20B5F5C4776C. AvgMaxErr = 0.171980529. MaxErr = 0.234375000.
[Jul 23 05:20:00] M332220523 Iter# = 80000 [ 0.02% complete] clocks = 02:15:20.844 [812.0844 sec/iter] Res64: 6374CC678224D058. AvgMaxErr = 0.171895016. MaxErr = 0.250000000.
[Jul 23 07:35:13] M332220523 Iter# = 90000 [ 0.03% complete] clocks = 02:15:10.622 [811.0622 sec/iter] Res64: 393DCD2788664405. AvgMaxErr = 0.172066525. MaxErr = 0.250000000.
[Jul 23 09:51:21] M332220523 Iter# = 100000 [ 0.03% complete] clocks = 02:16:05.131 [816.5132 sec/iter] Res64: 91B688264B5B3F39. AvgMaxErr = 0.171926060. MaxErr = 0.250000000.
Some behaviors to note, in this single-threaded run:
  1. clocks close to but less than elapsed time (difference in wall clock time from previous status output line; this version incorrectly labeled msec/iter values as sec/iter);
  2. clocks value fluctuates somewhat as differing data causes differing code branches on occasion, or variation in instruction timing with differing operands
  3. AvgMaxErr <0.25 but definitely above zero
  4. MaxErr up to 0.25 but definitely above zero
  5. AvgMaxErr differing from line to line
  6. MaxErr differing from line to line
  7. Res64 differing from line to line, seemingly random, not repeating or alternating
  8. If it was a version that used shift, that would vary line to line also.
When in doubt, try running and matching interim residues for known primes or other established values. See the mlucas.c source code, https://www.mersenneforum.org/showpo...82&postcount=4 for LL, https://www.mersenneforum.org/showpo...83&postcount=5 for PRP3, or for large exponents https://www.mersenneforum.org/showpo...83&postcount=5


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-01 at 00:18
kriesel is online now  
Old 2021-01-20, 17:47   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×29×31 Posts
Default What it may look like when something is not working correctly

See https://mersenneforum.org/showpost.p...8&postcount=76
Any of the following are reason to view the interim or final results with suspicion. The original poster was correct to doubt the accuracy of the run. Some of these will also apply to other software.
  1. Anomalously fast iterations
  2. clocks value exactly the same from line to line (or equivalently, ms/iter value)
  3. clocks value >> elapsed time between lines
  4. Res64 value repeating exactly line after line
  5. AvgMaxErr repeating exactly line after line
  6. AvgMaxErr = 0.
  7. MaxErr repeating exactly line after line
  8. MaxErr = 0.
  9. Residue shift count repeating exactly line after line. Consider a computation where you square a number, but first shift it left by n bits (n doublings). The square result will be shifted 2n, not n. Now consider doing that 10,000 times, modulo Mp.
This list suggests some possible additional error checks that could be incorporated at low cost. The values are being generated anyway for periodic output to log files. Variables that fail simple statistical tests could generate warnings or halts.

For comparison, brief alternate runs in gpuowl on the same exponent follow. Note the res64 matches at 200K and 310K, and no GEC errors logged, on these independent runs, indicating high reliability.
Code:
2021-01-20 12:19:01 gpuowl v6.11-380-g79ea0cc
2021-01-20 12:19:01 config: -user kriesel -cpu asr2/radeonvii3 -d 3 -use NO_ASM -maxAlloc 13000 -log 10000
2021-01-20 12:19:01 device 3, unique id ''
2021-01-20 12:19:01 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw)
2021-01-20 12:19:01 asr2/radeonvii3 Expected maximum carry32: 39160000
2021-01-20 12:19:02 asr2/radeonvii3 OpenCL args "-DEXP=110899639u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p-5 -DIWEIGHT_STEP_MINUS_1=-0xe.947b562a8bfep-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-01-20 12:19:07 asr2/radeonvii3 OpenCL compilation in 4.48 s
2021-01-20 12:19:08 asr2/radeonvii3 110899639 OK        0 loaded: blockSize 400, 0000000000000003
2021-01-20 12:19:08 asr2/radeonvii3 validating proof residues for power 8
2021-01-20 12:19:08 asr2/radeonvii3 Proof using power 8
2021-01-20 12:19:09 asr2/radeonvii3 110899639 OK      800   0.00%;  881 us/it; ETA 1d 03:09; 6191b4b775c8edca (check 0.59s)
2021-01-20 12:19:18 asr2/radeonvii3 110899639 OK    10000   0.01%;  882 us/it; ETA 1d 03:11; 59d707dfd3e8a6e5 (check 0.59s)
2021-01-20 12:19:27 asr2/radeonvii3 110899639 OK    20000   0.02%;  884 us/it; ETA 1d 03:13; 59d112f7284edbb4 (check 0.59s)
2021-01-20 12:19:37 asr2/radeonvii3 110899639 OK    30000   0.03%;  883 us/it; ETA 1d 03:12; b9114934905a8443 (check 0.59s)
2021-01-20 12:19:46 asr2/radeonvii3 110899639 OK    40000   0.04%;  882 us/it; ETA 1d 03:09; f5e1840cc2c9ae6f (check 0.59s)
2021-01-20 12:19:56 asr2/radeonvii3 110899639 OK    50000   0.05%;  881 us/it; ETA 1d 03:08; 3a6a9896d868f34e (check 0.60s)
2021-01-20 12:20:05 asr2/radeonvii3 110899639 OK    60000   0.05%;  882 us/it; ETA 1d 03:09; 581477e4ea2f2fd5 (check 0.62s)
2021-01-20 12:20:15 asr2/radeonvii3 110899639 OK    70000   0.06%;  897 us/it; ETA 1d 03:37; 76171fa52b081f88 (check 0.60s)
2021-01-20 12:20:24 asr2/radeonvii3 110899639 OK    80000   0.07%;  885 us/it; ETA 1d 03:15; b87b7c28d301446e (check 0.61s)
2021-01-20 12:20:34 asr2/radeonvii3 110899639 OK    90000   0.08%;  886 us/it; ETA 1d 03:16; 084955167e9c1678 (check 0.62s)
2021-01-20 12:20:43 asr2/radeonvii3 110899639 OK   100000   0.09%;  886 us/it; ETA 1d 03:17; 6866029ebdf6f42f (check 0.60s)
2021-01-20 12:20:53 asr2/radeonvii3 110899639 OK   110000   0.10%;  884 us/it; ETA 1d 03:13; 00fb4982ad9a9ac6 (check 0.65s)
2021-01-20 12:21:02 asr2/radeonvii3 110899639 OK   120000   0.11%;  887 us/it; ETA 1d 03:19; f2480bc5b17f8151 (check 0.60s)
2021-01-20 12:21:12 asr2/radeonvii3 110899639 OK   130000   0.12%;  884 us/it; ETA 1d 03:13; f1eb30b6262e11ba (check 0.59s)
2021-01-20 12:21:21 asr2/radeonvii3 110899639 OK   140000   0.13%;  880 us/it; ETA 1d 03:05; e334d2375c872f0f (check 0.59s)
2021-01-20 12:21:30 asr2/radeonvii3 110899639 OK   150000   0.14%;  881 us/it; ETA 1d 03:06; 951a0e26b6da9927 (check 0.59s)
2021-01-20 12:21:40 asr2/radeonvii3 110899639 OK   160000   0.14%;  881 us/it; ETA 1d 03:05; ff557ebe567e1f0d (check 0.59s)
2021-01-20 12:21:49 asr2/radeonvii3 110899639 OK   170000   0.15%;  882 us/it; ETA 1d 03:07; 3d30664ec2bf6118 (check 0.59s)
2021-01-20 12:21:59 asr2/radeonvii3 110899639 OK   180000   0.16%;  881 us/it; ETA 1d 03:06; 472b05a96d9ecf1a (check 0.59s)
2021-01-20 12:22:08 asr2/radeonvii3 110899639 OK   190000   0.17%;  880 us/it; ETA 1d 03:04; 12cd354415712251 (check 0.59s)
2021-01-20 12:22:17 asr2/radeonvii3 110899639 OK   200000   0.18%;  882 us/it; ETA 1d 03:07; b56e64d2ec39cd4d (check 0.59s)
2021-01-20 12:22:27 asr2/radeonvii3 110899639 OK   210000   0.19%;  881 us/it; ETA 1d 03:05; f84002c6841db007 (check 0.59s)
2021-01-20 12:22:36 asr2/radeonvii3 110899639 OK   220000   0.20%;  881 us/it; ETA 1d 03:05; 57cdfa904d0b3cda (check 0.59s)
2021-01-20 12:22:46 asr2/radeonvii3 110899639 OK   230000   0.21%;  881 us/it; ETA 1d 03:06; 0307b1331c567a43 (check 0.59s)
2021-01-20 12:22:55 asr2/radeonvii3 110899639 OK   240000   0.22%;  881 us/it; ETA 1d 03:05; c9b34a5047ba285b (check 0.59s)
2021-01-20 12:23:05 asr2/radeonvii3 110899639 OK   250000   0.23%;  882 us/it; ETA 1d 03:06; 3f17202d73f429ee (check 0.60s)
2021-01-20 12:23:14 asr2/radeonvii3 110899639 OK   260000   0.23%;  881 us/it; ETA 1d 03:05; 93d688231b646b99 (check 0.60s)
2021-01-20 12:23:23 asr2/radeonvii3 110899639 OK   270000   0.24%;  881 us/it; ETA 1d 03:04; 0369c93e11d7a67c (check 0.59s)
2021-01-20 12:23:33 asr2/radeonvii3 110899639 OK   280000   0.25%;  881 us/it; ETA 1d 03:04; e2688fd986ab2216 (check 0.59s)
2021-01-20 12:23:42 asr2/radeonvii3 110899639 OK   290000   0.26%;  880 us/it; ETA 1d 03:03; 8040fd8a9dfcf9eb (check 0.59s)
2021-01-20 12:23:52 asr2/radeonvii3 110899639 OK   300000   0.27%;  881 us/it; ETA 1d 03:04; 198cbf452a3e452b (check 0.59s)
2021-01-20 12:24:01 asr2/radeonvii3 110899639 OK   310000   0.28%;  882 us/it; ETA 1d 03:07; 12f9b4443a1ca408 (check 0.59s)
2021-01-20 12:24:03 asr2/radeonvii3 Stopping, please wait..
Code:
2021-01-20 12:10:14 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw)
2021-01-20 12:10:14 asr2/radeonvii3 Expected maximum carry32: 39160000
2021-01-20 12:10:16 asr2/radeonvii3 OpenCL args "-DEXP=110899639u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p-5 -DIWEIGHT_STEP_MINUS_1=-0xe.947b562a8bfep-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-01-20 12:10:20 asr2/radeonvii3 OpenCL compilation in 4.42 s
2021-01-20 12:10:21 asr2/radeonvii3 110899639 OK        0 loaded: blockSize 400, 0000000000000003
2021-01-20 12:10:21 asr2/radeonvii3 validating proof residues for power 8
2021-01-20 12:10:21 asr2/radeonvii3 Proof using power 8
2021-01-20 12:10:23 asr2/radeonvii3 110899639 OK      800   0.00%;  884 us/it; ETA 1d 03:13; 6191b4b775c8edca (check 0.59s)
2021-01-20 12:13:19 asr2/radeonvii3 110899639 OK   200000   0.18%;  884 us/it; ETA 1d 03:10; b56e64d2ec39cd4d (check 0.59s)
2021-01-20 12:14:55 asr2/radeonvii3 Stopping, please wait..
2021-01-20 12:14:55 asr2/radeonvii3 110899639 OK   308400   0.28%;  882 us/it; ETA 1d 03:05; 9e394f7f61bb85d7 (check 0.60s)
2021-01-20 12:14:56 asr2/radeonvii3 Exiting because "stop requested"
2021-01-20 12:14:56 asr2/radeonvii3 Bye
2021-01-20 12:15:23 config: -user kriesel -cpu asr2/radeonvii3 -d 3 -use NO_ASM -maxAlloc 13000 -log 10000
2021-01-20 12:15:23 device 3, unique id ''
2021-01-20 12:15:23 asr2/radeonvii3 110899639 FFT: 6M 1K:12:256 (17.63 bpw)
2021-01-20 12:15:23 asr2/radeonvii3 Expected maximum carry32: 39160000
2021-01-20 12:15:24 asr2/radeonvii3 OpenCL args "-DEXP=110899639u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x9.70d2e6d4d6eb8p-5 -DIWEIGHT_STEP_MINUS_1=-0xe.947b562a8bfep-6 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-01-20 12:15:29 asr2/radeonvii3 OpenCL compilation in 4.55 s
2021-01-20 12:15:30 asr2/radeonvii3 110899639 OK   308400 loaded: blockSize 400, 9e394f7f61bb85d7
2021-01-20 12:15:30 asr2/radeonvii3 validating proof residues for power 8
2021-01-20 12:15:30 asr2/radeonvii3 Proof using power 8
2021-01-20 12:15:31 asr2/radeonvii3 110899639 OK   309200   0.28%;  895 us/it; ETA 1d 03:29; f60084731d7963cc (check 0.59s)
2021-01-20 12:15:32 asr2/radeonvii3 110899639 OK   310000   0.28%;  885 us/it; ETA 1d 03:11; 12f9b4443a1ca408 (check 0.59s)
2021-01-20 12:15:42 asr2/radeonvii3 110899639 OK   320000   0.29%;  886 us/it; ETA 1d 03:12; f2d36a6ab5abc361 (check 0.59s)
2021-01-20 12:15:51 asr2/radeonvii3 110899639 OK   330000   0.30%;  884 us/it; ETA 1d 03:09; 44b44f2c3550f717 (check 0.59s)
2021-01-20 12:16:01 asr2/radeonvii3 110899639 OK   340000   0.31%;  886 us/it; ETA 1d 03:13; b30686ce36dcf10c (check 0.61s)
2021-01-20 12:16:10 asr2/radeonvii3 110899639 OK   350000   0.32%;  883 us/it; ETA 1d 03:08; b04af45d28e73cc9 (check 0.61s)
2021-01-20 12:16:20 asr2/radeonvii3 110899639 OK   360000   0.32%;  882 us/it; ETA 1d 03:04; fe9ea80343df78e1 (check 0.59s)
2021-01-20 12:16:29 asr2/radeonvii3 110899639 OK   370000   0.33%;  882 us/it; ETA 1d 03:04; 6afc77809e993a9f (check 0.59s)
2021-01-20 12:16:39 asr2/radeonvii3 110899639 OK   380000   0.34%;  881 us/it; ETA 1d 03:03; 280d5489847a1cec (check 0.59s)
2021-01-20 12:16:48 asr2/radeonvii3 110899639 OK   390000   0.35%;  882 us/it; ETA 1d 03:05; a46e45e9343c52a9 (check 0.59s)
2021-01-20 12:16:58 asr2/radeonvii3 110899639 OK   400000   0.36%;  882 us/it; ETA 1d 03:05; 9e6c6fb76ef72a19 (check 0.59s)
2021-01-20 12:17:07 asr2/radeonvii3 110899639 OK   410000   0.37%;  881 us/it; ETA 1d 03:03; f861f42a6182792e (check 0.60s)
2021-01-20 12:17:16 asr2/radeonvii3 110899639 OK   420000   0.38%;  881 us/it; ETA 1d 03:02; a63fbc909d404859 (check 0.60s)
2021-01-20 12:17:26 asr2/radeonvii3 110899639 OK   430000   0.39%;  882 us/it; ETA 1d 03:04; 496ebdb31daffa63 (check 0.62s)
2021-01-20 12:17:36 asr2/radeonvii3 110899639 OK   440000   0.40%;  914 us/it; ETA 1d 04:02; f1d3e4d3f9bff432 (check 0.59s)
2021-01-20 12:17:45 asr2/radeonvii3 110899639 OK   450000   0.41%;  881 us/it; ETA 1d 03:02; fbf71e75373c1a72 (check 0.59s)
2021-01-20 12:17:54 asr2/radeonvii3 110899639 OK   460000   0.41%;  881 us/it; ETA 1d 03:02; 42efdc145cda3529 (check 0.61s)
2021-01-20 12:18:04 asr2/radeonvii3 110899639 OK   470000   0.42%;  890 us/it; ETA 1d 03:19; 93134f128a91d38e (check 0.66s)
2021-01-20 12:18:13 asr2/radeonvii3 110899639 OK   480000   0.43%;  884 us/it; ETA 1d 03:07; fd1c5887489a268f (check 0.61s)
2021-01-20 12:18:23 asr2/radeonvii3 110899639 OK   490000   0.44%;  883 us/it; ETA 1d 03:05; 1f58ebc4c56caa98 (check 0.59s)
2021-01-20 12:18:32 asr2/radeonvii3 110899639 OK   500000   0.45%;  886 us/it; ETA 1d 03:10; a5e9c983cefcd245 (check 0.59s)
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-24 at 19:45
kriesel is online now  
Old 2021-07-02, 21:35   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11111100110112 Posts
Default Mlucas v19.0 -h help output

As generated by the program:
Code:
    Mlucas 19.0

    http://www.mersenneforum.org/mayer/README.html

INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing FFT radix tables...
For the full list of command line options, run the program with the -h flag.
For a list of command-line options grouped by type, run the program with the -topic flag.

Mlucas command line options:

         Symbol and abbreviation key:
               <CR> :  carriage return
                |   :  separator for one-of-the-following multiple-choice menus
               []   :  encloses optional arguments
               {}   :  denotes user-supplied numerical arguments of the type noted.
                      ({int} means nonnegative integer, {+int} = positive int, {float} = float.)
          -argument :  Vertical stacking indicates argument short 'nickname' options,
          -arg      :  e.g. in this example '-arg' can be used in place of '-argument'.

         Supported arguments:

         <CR>        Default mode: looks for a worktodo.ini file in the local
                     directory; if none found, prompts for manual keyboard entry

Help submenus by topic. No additional arguments may follow the displayed ones:
 -s            Post-build self-testing for various FFT-length rnages.
 -fftlen       FFT-length setting.
 -radset       FFT radix-set specification.
 -m[ersenne]   Mersenne-number primality testing.
 -f[ermat]     Fermat-number primality testing.
 -shift        ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration-0 residue).
 -prp          Probable-primality testing mode.
 -iters        Iteration-number setting.
 -nthread|cpu  Setting threadcount and CPU core affinity.

 *** NOTE: *** The following self-test options will cause an mlucas.cfg file containing
     the optimal FFT radix set for the runlength(s) tested to be created (if one did not
     exist previously) or appended (if one did) with new timing data. Such a file-write is
     triggered by each complete set of FFT radices available at a given FFT length being
     tested, i.e. by a self-test without a user-specified -radset argument.
     (A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
     the program will use the largest permissible exponent for the given FFT length, based on
     its internal length-setting algorithm). The user must specify the number of iterations for
     the self-test via the -iters flag; while it is not required, it is strongly recommended to
     stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger
     values being preferred for multithreaded timing tests, in order to assure a decently large
     slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless
     roundoff error levels on a given compute platform are such that the default exponent at one or
     more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
        If the user lets the program set the exponent and uses one of the aforementioned standard
     self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
     resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed
     one for the given default exponent at the iteration count in question, with eligible radix sets
     consisting of those for which the roundoff error remains below an acceptable threshold.
     If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)****************
     and/or a non-default iteration number, the resulting best-timing FFT radix set will only be
     written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here
     This is important for tuning code parameters to your particular platform.

   FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS

 -s {...}    Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use.

 -s tiny     Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s t        This will take around 1 minute on a fast CPU..

 -s small    Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s s        This will take around 10 minutes on a fast CPU..

**** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ******
*                                                                          *
* -s medium   Runs set of  16 Mersenne exponents, ranging from 2614999 to 9530803
* -s m        This will take around an hour on a fast CPU.                 *
*                                                                          *
****************************************************************************

 -s large    Runs set of  24 Mersenne exponents, ranging from 10151971 to 72123137
 -s l        This will take around an hour on a fast CPU.

 -s huge     Runs set of  16 Mersenne exponents, ranging from 76821337 to 282508657
 -s h        This will take a couple of hours on a fast CPU.

 -s all      Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets.
 -s a        This will take several hours on a fast CPU.

 -fftlen {+int}   If {+int} is one of the available FFT lengths (in Kilodoubles), runs all
             all available FFT radices available at that length, unless the -radset flag is
             invoked (see below for details). If -fftlen is invoked without the -iters flag,
             it is assumed the user wishes to do a production run with a non-default FFT length,
             In this case the program requires a valid worktodo.ini-file entry with exponent
             not more than 5% larger than the default maximum for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters but without a
             user-supplied exponent, the program will do the specified number of iterations
             using the default self-test Mersenne or Fermat exponent for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters and either the
             -m or -f flag and a user-supplied exponent, the program will do the specified
             number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
             or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.

             In either of the latter 2 cases, the program will produce a cfg-file entry based
             on the timing results, assuming at least one radix set ran the specified #iters
             to completion without suffering a fatal error of some kind.
             Use this to find the optimal radix set for a single FFT length on your hardware.

             NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT-
             LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT
             MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE!

 -radset {int}    Specific index of a set of complex FFT radices to use, based on the big
             select table in the function get_fft_radices(). Requires a supported value of
             -fftlen to also be specified, as well as a value of -iters for the timing test.

 -m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
             where int must be an odd prime. If -iters is also invoked, this indicates a timing test.
             and requires suitable added arguments (-fftlen and, optionally, -radset) to be supplied.
                If the -fftlen option (and optionally -radset) is also invoked but -iters is not, the
             program first checks the first line of the worktodo.ini file to see if the assignment
             specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
             argument. If so, the -fftlen argument is treated as a user override of the default FFT
             length for the exponent. If -radset is also invoked, this is similarly treated as a user-
             specified radix set for the user-set FFT length; otherwise the program will use the cfg file
             to select the radix set to be used for the user-forced FFT length.
                If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is
             run on the user-specified Mersenne number using all sets of FFT radices available at the
             specified FFT length.
                If the -fftlen option is not invoked, the self-tests use all sets of
             FFT radices available at that exponent's default FFT length.
                Use this to find the optimal radix set for a single given Mersenne number
             exponent on your hardware, similarly to the -fftlen option.
                Performs as many iterations as specified via the -iters flag [required].

 -f {int}    Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
                If desired this can be invoked together with the -fftlen option.
             as for the Mersenne-number self-tests (see notes about the -m flag;
             note that not all FFT lengths supported for -m are available for -f).
             Optimal radix sets and timings are written to a fermat.cfg file.
                Performs as many iterations as specified via the -iters flag [required].

 -shift         ***SIMD builds only*** Bits by which to circular-left-shift the initial seed.
             This shift count is doubled (modulo the number of bits of the modulus being tested)
             each iteration. Savefile residues are rightward-shifted by the current shift count
             before being written to the file; thus savefiles contain the unshifted residue, and
             separately the current shift count, which the program uses to leftward-shift the
             savefile residue when the program is restarted from interrupt.
                The shift count is a 64-bit unsigned int (e.g. to accommodate Fermat numbers > F32).

 -prp {int}     Instead of running the rigorous primality test defined for the modulus type
             in question (Lucas-Lehmer test for Mersenne numbers, Pe'pin test for Fermat numbers
             do a probably-primality test to the specified integer base b = {int}.
                For a Mersenne number M(p), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do a Fermat-PRP test, consisting of (p-2) iterations of
             form x = b*x^2 (mod M(p)) plus a final mod-squaring x = x^2 (mod M(p)), with M(p) being
             a probable-prime to base b if the result == 1.
                For a Fermat number F(m), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do an Euler-PRP test (referred to as a Pe'pin test for these
             moduli), i.e. do 2^m-1 iterations of form x = b*x^2 (mod M(p)), with M(p) being not merely
             a probable prime but in fact deterministically a prime if the result == -1. The reason we
             still use the -prp flag in the Fermat case is for legacy-code compatibility: All pre-v18
             Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the -prp
             flag with a suitable base-value to override this default choice of base.

 -iters {int}   Do {int} self-test iterations of the type determined by the
             modulus-related options (-s/-m = Lucas-Lehmer test iterations with
             initial seed 4, -f = Pe'pin-test squarings with initial seed 3.
Note the duplicated exponent ranges on -s tiny and -s small is a documentation error. I think that error ripples all the way through -s huge, understating capability.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2023-06-13 at 13:50
kriesel is online now  
Old 2021-08-12, 14:32   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×29×31 Posts
Default Mlucas v19.1 -h help output

Info portion will vary depending on the system it is run upon.
Code:
~/mlucas_v19.1/mlucas_v19.1$ ./Mlucas-avx2 -h

    Mlucas 19.1

    http://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0.
INFO: Build uses AVX2 instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 12 available processor cores.
INFO: testing FFT radix tables...
For the full list of command line options, run the program with the -h flag.
For a list of command-line options grouped by type, run the program with the -topic flag.

Mlucas command line options:

         Symbol and abbreviation key:
               <CR> :  carriage return
                |   :  separator for one-of-the-following multiple-choice menus
               []   :  encloses optional arguments
               {}   :  denotes user-supplied numerical arguments of the type noted.
                      ({int} means nonnegative integer, {+int} = positive int, {float} = float.)
          -argument :  Vertical stacking indicates argument short 'nickname' options,
          -arg      :  e.g. in this example '-arg' can be used in place of '-argument'.

         Supported arguments:

         <CR>        Default mode: looks for a worktodo.ini file in the local
                     directory; if none found, prompts for manual keyboard entry

Help submenus by topic. No additional arguments may follow the displayed ones:
 -s            Post-build self-testing for various FFT-length rnages.
 -fftlen       FFT-length setting.
 -radset       FFT radix-set specification.
 -m[ersenne]   Mersenne-number primality testing.
 -f[ermat]     Fermat-number primality testing.
 -shift        ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration-0 residue).
 -prp          Probable-primality testing mode.
 -iters        Iteration-number setting.
 -nthread|cpu  Setting threadcount and CPU core affinity.

 *** NOTE: *** The following self-test options will cause an mlucas.cfg file containing
     the optimal FFT radix set for the runlength(s) tested to be created (if one did not
     exist previously) or appended (if one did) with new timing data. Such a file-write is
     triggered by each complete set of FFT radices available at a given FFT length being
     tested, i.e. by a self-test without a user-specified -radset argument.
     (A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
     the program will use the largest permissible exponent for the given FFT length, based on
     its internal length-setting algorithm). The user must specify the number of iterations for
     the self-test via the -iters flag; while it is not required, it is strongly recommended to
     stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger
     values being preferred for multithreaded timing tests, in order to assure a decently large
     slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless
     roundoff error levels on a given compute platform are such that the default exponent at one or
     more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
        If the user lets the program set the exponent and uses one of the aforementioned standard
     self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
     resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed
     one for the given default exponent at the iteration count in question, with eligible radix sets
     consisting of those for which the roundoff error remains below an acceptable threshold.
     If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)****************
     and/or a non-default iteration number, the resulting best-timing FFT radix set will only be
     written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here *******
     This is important for tuning code parameters to your particular platform.

   FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS

 -s {...}    Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use.

 -s tiny     Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s t        This will take around 1 minute on a fast CPU..

 -s small    Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s s        This will take around 10 minutes on a fast CPU..

**** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ******
*                                                                          *
* -s medium   Runs set of  16 Mersenne exponents, ranging from 2614999 to 9530803
* -s m        This will take around an hour on a fast CPU.                 *
*                                                                          *
****************************************************************************

 -s large    Runs set of  24 Mersenne exponents, ranging from 10151971 to 72123137
 -s l        This will take around an hour on a fast CPU.

 -s huge     Runs set of  16 Mersenne exponents, ranging from 76821337 to 282508657
 -s h        This will take a couple of hours on a fast CPU.

 -s all      Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets.
 -s a        This will take several hours on a fast CPU.

 -fftlen {+int}   If {+int} is one of the available FFT lengths (in Kilodoubles), runs all
             all available FFT radices available at that length, unless the -radset flag is
             invoked (see below for details). If -fftlen is invoked without the -iters flag,
             it is assumed the user wishes to do a production run with a non-default FFT length,
             In this case the program requires a valid worktodo.ini-file entry with exponent
             not more than 5% larger than the default maximum for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters but without a
             user-supplied exponent, the program will do the specified number of iterations
             using the default self-test Mersenne or Fermat exponent for that FFT length.
                  If -fftlen is invoked with a user-supplied value of -iters and either the
             -m or -f flag and a user-supplied exponent, the program will do the specified
             number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
             or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.

             In either of the latter 2 cases, the program will produce a cfg-file entry based
             on the timing results, assuming at least one radix set ran the specified #iters
             to completion without suffering a fatal error of some kind.
             Use this to find the optimal radix set for a single FFT length on your hardware.

             NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT-
             LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT
             MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE!

 -radset {int}    Specific index of a set of complex FFT radices to use, based on the big
             select table in the function get_fft_radices(). Requires a supported value of
             -fftlen to also be specified, as well as a value of -iters for the timing test.

 -m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
             where int must be an odd prime. If -iters is also invoked, this indicates a timing test.
             and requires suitable added arguments (-fftlen and, optionally, -radset) to be supplied.
                If the -fftlen option (and optionally -radset) is also invoked but -iters is not, the
             program first checks the first line of the worktodo.ini file to see if the assignment
             specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
             argument. If so, the -fftlen argument is treated as a user override of the default FFT
             length for the exponent. If -radset is also invoked, this is similarly treated as a user-
             specified radix set for the user-set FFT length; otherwise the program will use the cfg file
             to select the radix set to be used for the user-forced FFT length.
                If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is
             run on the user-specified Mersenne number using all sets of FFT radices available at the
             specified FFT length.
                If the -fftlen option is not invoked, the self-tests use all sets of
             FFT radices available at that exponent's default FFT length.
                Use this to find the optimal radix set for a single given Mersenne number
             exponent on your hardware, similarly to the -fftlen option.
                Performs as many iterations as specified via the -iters flag [required].

 -f {int}    Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
                If desired this can be invoked together with the -fftlen option.
             as for the Mersenne-number self-tests (see notes about the -m flag;
             note that not all FFT lengths supported for -m are available for -f).
             Optimal radix sets and timings are written to a fermat.cfg file.
                Performs as many iterations as specified via the -iters flag [required].

 -shift         ***SIMD builds only*** Bits by which to circular-left-shift the initial seed.
             This shift count is doubled (modulo the number of bits of the modulus being tested)
             each iteration. Savefile residues are rightward-shifted by the current shift count
             before being written to the file; thus savefiles contain the unshifted residue, and
             separately the current shift count, which the program uses to leftward-shift the
             savefile residue when the program is restarted from interrupt.
                The shift count is a 64-bit unsigned int (e.g. to accommodate Fermat numbers > F32).

 -prp {int}     Instead of running the rigorous primality test defined for the modulus type
             in question (Lucas-Lehmer test for Mersenne numbers, Pe'pin test for Fermat numbers
             do a probably-primality test to the specified integer base b = {int}.
                For a Mersenne number M(p), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do a Fermat-PRP test, consisting of (p-2) iterations of
             form x = b*x^2 (mod M(p)) plus a final mod-squaring x = x^2 (mod M(p)), with M(p) being
             a probable-prime to base b if the result == 1.
                For a Fermat number F(m), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do an Euler-PRP test (referred to as a Pe'pin test for these
             moduli), i.e. do 2^m-1 iterations of form x = b*x^2 (mod M(p)), with M(p) being not merely
             a probable prime but in fact deterministically a prime if the result == -1. The reason we
             still use the -prp flag in the Fermat case is for legacy-code compatibility: All pre-v18
             Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the -prp
             flag with a suitable base-value to override this default choice of base.

 -iters {int}   Do {int} self-test iterations of the type determined by the
             modulus-related options (-s/-m = Lucas-Lehmer test iterations with
             initial seed 4, -f = Pe'pin-test squarings with initial seed 3.

 -nthread {int}   For multithread-enabled builds, run with this many threads.
     If the user does not specify a thread count, the default is to run single-threaded
     with that thread's affinity set to logical core 0.

     AFFINITY: The code will attempt to set the affinity of the resulting threads
     0:n-1 to the same-indexed processor cores - whether this means distinct physical
     cores is entirely up to the CPU vendor - E.g. Intel uses such a numbering scheme
     but AMD does not. For this reason as of v17 this option is deprecated in favor of
     the -cpu flag, whose usage is detailed below, with the online README page providing
     guidance for the core-numbering schemes of popular CPU vendors.

     If n exceeds the available number of logical processor cores (call it #cpu), the
     program will halt with an error message.

     For greater control over affinity setting, use the -cpu option, which supports two
     distinct core-specification syntaxes (which may be mixed together), as follows:

     -cpu {lo[:hi[:incr]]}   (All args {int} here) Set thread/CPU affinity.
     NOTE: This flag and -nthread are mutually exclusive: If -cpu is used, the threadcount
     is inferred from the numeric-argument-triplet which follows. If only the 'lo' argument
     of the triplet is supplied, this means 'run single-threaded with affinity to CPU {lo}.'
     If the increment (third) argument of the triplet is omitted, it is taken as incr = 1.
     The CPU set encoded by the integer-triplet argument to -cpu corresponds to the
     values of the integer loop index i in the C-loop for(i = lo; i <= hi; i += incr),
     excluding the loop-exit value of i. Thus '-cpu 0:3' and '-cpu 0:3:1' are both
     exactly equivalent to '-nthread 4', whereas '-cpu 0:6:2' and '-cpu 0:7:2' both
     specify affinity setting to cores 0,2,4,6, assuming said cores exist.
     Lastly, note that no whitespace is permitted within the colon-separated numeric field.

     -cpu {triplet0[,triplet1,...]}   This is simply an extended version of the above affinity-
     setting syntax in which each of the comma-separated 'triplet' subfields is in the above
     form and, analogously to the one-triplet-only version, no whitespace is permitted within
     the colon-and-comma-separated numeric field. Thus '-cpu 0:3,8:11' and '-cpu 0:3:1,8:11:1'
     both specify an 8-threaded run with affinity set to the core quartets 0-3 and 8-11,
     whereas '-cpu 0:3:2,8:11:2' means run 4-threaded on cores 0,2,8,10. As described for the
     -nthread option, it is an error for any core index to exceed the available number of logical
     processor cores.
Note the duplicated exponent ranges on -s tiny and -s small is a documentation error. I think that error ripples all the way through -s huge, understating capability.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2023-06-13 at 13:51 Reason: added note
kriesel is online now  
Old 2021-08-13, 16:24   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·29·31 Posts
Default Mlucas V20.0 -h help output

./Mlucas -h produces lesser output and including an error message. As a workaround, use ./Mlucas -h printall
Info portion will vary depending on the system it is run upon.
There does not appear to be any P-1-specific help output available at this time.

Code:
~/mlucas_v20/obj$ ./Mlucas -h printall

    Mlucas 20.0

    http://www.mersenneforum.org/mayer/README.html

INFO: testing qfloat routines...
System total RAM = 16243, free RAM = 287
INFO: 287 MB of free system RAM detected; will use up to 90% = 258 MB of that, unless user specifies a lower fraction via -maxalloc.
CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0.
INFO: Build uses AVX2 instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 12 available processor cores.
INFO: testing FFT radix tables...
For the full list of command line options, run the program with the -h flag.
For a list of command-line options grouped by type, run the program with the -topic flag.

Mlucas command line options:

         Symbol and abbreviation key:
               <CR> :  carriage return
                |   :  separator for one-of-the-following multiple-choice menus
               []   :  encloses optional arguments
               {}   :  denotes user-supplied numerical arguments of the type noted.
                      ({int} means nonnegative integer, {+int} = positive int, {float} = float.)
          -argument :  Vertical stacking indicates argument short 'nickname' options,
          -arg      :  e.g. in this example '-arg' can be used in place of '-argument'.

         Supported arguments:

         <CR>        Default mode: looks for a worktodo.ini file in the local
                     directory; if none found, prompts for manual keyboard entry

Help submenus by topic. No additional arguments may follow the displayed ones:
 -s            Post-build self-testing for various FFT-length rnages.
 -fft[len]     FFT-length setting.
 -radset       FFT radix-set specification.
 -m[ersenne]   Mersenne-number primality testing.
 -f[ermat]     Fermat-number primality testing.
 -shift        ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration-0 residue).
 -prp          Probable-primality testing mode.
 -iters        Iteration-number setting.
 -nthread|cpu  Setting threadcount and CPU core affinity.
 -maxalloc     Setting maximum-percentage of available system RAM to use per instance.

 *** NOTE: *** The following self-test options will cause an mlucas.cfg file containing
     the optimal FFT radix set for the runlength(s) tested to be created (if one did not
     exist previously) or appended (if one did) with new timing data. Such a file-write is
     triggered by each complete set of FFT radices available at a given FFT length being
     tested, i.e. by a self-test without a user-specified -radset argument.
     (A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
     the program will use the largest permissible exponent for the given FFT length, based on
     its internal length-setting algorithm). The user must specify the number of iterations for
     the self-test via the -iters flag; while it is not required, it is strongly recommended to
     stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger
     values being preferred for multithreaded timing tests, in order to assure a decently large
     slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless
     roundoff error levels on a given compute platform are such that the default exponent at one or
     more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
        If the user lets the program set the exponent and uses one of the aforementioned standard
     self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
     resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed
     one for the given default exponent at the iteration count in question, with eligible radix sets
     consisting of those for which the roundoff error remains below an acceptable threshold.
     If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)****************
     and/or a non-default iteration number, the resulting best-timing FFT radix set will only be
     written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here *******
     This is important for tuning code parameters to your particular platform.

   FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS

 -s {...}    Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use.

 -s tiny     Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s t        This will take around 1 minute on a fast CPU..

 -s small    Runs 100-iteration self-tests on set of  32 Mersenne exponents, ranging from 173431 to 2455003
 -s s        This will take around 10 minutes on a fast CPU..

**** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ******
*                                                                          *
* -s medium   Runs set of  16 Mersenne exponents, ranging from 2614999 to 9530803
* -s m        This will take around an hour on a fast CPU.                 *
*                                                                          *
****************************************************************************

 -s large    Runs set of  24 Mersenne exponents, ranging from 10151971 to 72123137
 -s l        This will take around an hour on a fast CPU.

 -s huge     Runs set of  16 Mersenne exponents, ranging from 76821337 to 282508657
 -s h        This will take a couple of hours on a fast CPU.

 -s all      Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets.
 -s a        This will take several hours on a fast CPU.

 -fft[len] {+int}   If {+int} is one of the available FFT lengths (in Kilodoubles), runs all
             all available FFT radices available at that length, unless the -radset flag is
             invoked (see below for details). If -fft is invoked without the -iters flag,
             it is assumed the user wishes to do a production run with a non-default FFT length,
             In this case the program requires a valid worktodo.ini-file entry with exponent
             not more than 5% larger than the default maximum for that FFT length.
                  If -fft is invoked with a user-supplied value of -iters but without a
             user-supplied exponent, the program will do the specified number of iterations
             using the default self-test Mersenne or Fermat exponent for that FFT length.
                  If -fft is invoked with a user-supplied value of -iters and either the
             -m or -f flag and a user-supplied exponent, the program will do the specified
             number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
             or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.

             In either of the latter 2 cases, the program will produce a cfg-file entry based
             on the timing results, assuming at least one radix set ran the specified #iters
             to completion without suffering a fatal error of some kind.
             Use this to find the optimal radix set for a single FFT length on your hardware.

             NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT-
             LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT
             MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE!

 -radset {int}    Specific index of a set of complex FFT radices to use, based on the big
             select table in the function get_fft_radices(). Requires a supported value of
             -fft to also be specified, as well as a value of -iters for the timing test.

 -m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
             where int must be an odd prime. If -iters is also invoked, this indicates a timing test.
             and requires suitable added arguments (-fft and, optionally, -radset) to be supplied.
                If the -fft option (and optionally -radset) is also invoked but -iters is not, the
             program first checks the first line of the worktodo.ini file to see if the assignment
             specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
             argument. If so, the -fft argument is treated as a user override of the default FFT
             length for the exponent. If -radset is also invoked, this is similarly treated as a user-
             specified radix set for the user-set FFT length; otherwise the program will use the cfg file
             to select the radix set to be used for the user-forced FFT length.
                If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is
             run on the user-specified Mersenne number using all sets of FFT radices available at the
             specified FFT length.
                If the -fft option is not invoked, the self-tests use all sets of
             FFT radices available at that exponent's default FFT length.
                Use this to find the optimal radix set for a single given Mersenne number
             exponent on your hardware, similarly to the -fft option.
                Performs as many iterations as specified via the -iters flag [required].

 -f {int}    Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
                If desired this can be invoked together with the -fft option.
             as for the Mersenne-number self-tests (see notes about the -m flag;
             note that not all FFT lengths supported for -m are available for -f).
             Optimal radix sets and timings are written to a fermat.cfg file.
                Performs as many iterations as specified via the -iters flag [required].

 -shift         ***SIMD builds only*** Bits by which to circular-left-shift the initial seed.
             This shift count is doubled (modulo the number of bits of the modulus being tested)
             each iteration. Savefile residues are rightward-shifted by the current shift count
             before being written to the file; thus savefiles contain the unshifted residue, and
             separately the current shift count, which the program uses to leftward-shift the
             savefile residue when the program is restarted from interrupt.
                The shift count is a 64-bit unsigned int (e.g. to accommodate Fermat numbers > F32).

 -prp {int}     Instead of running the rigorous primality test defined for the modulus type
             in question (Lucas-Lehmer test for Mersenne numbers, Pe'pin test for Fermat numbers
             do a probably-primality test to the specified integer base b = {int}.
                For a Mersenne number M(p), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do a Fermat-PRP test, consisting of (p-2) iterations of
             form x = b*x^2 (mod M(p)) plus a final mod-squaring x = x^2 (mod M(p)), with M(p) being
             a probable-prime to base b if the result == 1.
                For a Fermat number F(m), starting with initial seed x = b (which must not = 2
             or a power of 2), this means do an Euler-PRP test (referred to as a Pe'pin test for these
             moduli), i.e. do 2^m-1 iterations of form x = b*x^2 (mod F(m)), with F(m) being not merely
             a probable prime but in fact deterministically a prime if the result == -1. The reason we
             still use the -prp flag in the Fermat case is for legacy-code compatibility: All pre-v18
             Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the -prp
             flag with a suitable base-value to override this default choice of base.

 -iters {int}   Do {int} self-test iterations of the type determined by the
             modulus-related options (-s/-m = Lucas-Lehmer test iterations with
             initial seed 4, -f = Pe'pin-test squarings with initial seed 3.

 -maxalloc {int}   Maximum-percentage of available system RAM to use per instance. Must be in [10,90], default = 90.

 -nthread {int}   For multithread-enabled builds, run with this many threads.
     If the user does not specify a thread count, the default is to run single-threaded
     with that thread's affinity set to logical core 0.

     AFFINITY: The code will attempt to set the affinity of the resulting threads
     0:n-1 to the same-indexed processor cores - whether this means distinct physical
     cores is entirely up to the CPU vendor - E.g. Intel uses such a numbering scheme
     but AMD does not. For this reason as of v17 this option is deprecated in favor of
     the -cpu flag, whose usage is detailed below, with the online README page providing
     guidance for the core-numbering schemes of popular CPU vendors.

     If n exceeds the available number of logical processor cores (call it #cpu), the
     program will halt with an error message.

     For greater control over affinity setting, use the -cpu option, which supports two
     distinct core-specification syntaxes (which may be mixed together), as follows:

     -cpu {lo[:hi[:incr]]}   (All args {int} here) Set thread/CPU affinity.
     NOTE: This flag and -nthread are mutually exclusive: If -cpu is used, the threadcount
     is inferred from the numeric-argument-triplet which follows. If only the 'lo' argument
     of the triplet is supplied, this means 'run single-threaded with affinity to CPU {lo}.'
     If the increment (third) argument of the triplet is omitted, it is taken as incr = 1.
     The CPU set encoded by the integer-triplet argument to -cpu corresponds to the
     values of the integer loop index i in the C-loop for(i = lo; i <= hi; i += incr),
     excluding the loop-exit value of i. Thus '-cpu 0:3' and '-cpu 0:3:1' are both
     exactly equivalent to '-nthread 4', whereas '-cpu 0:6:2' and '-cpu 0:7:2' both
     specify affinity setting to cores 0,2,4,6, assuming said cores exist.
     Lastly, note that no whitespace is permitted within the colon-separated numeric field.

     -cpu {triplet0[,triplet1,...]}   This is simply an extended version of the above affinity-
     setting syntax in which each of the comma-separated 'triplet' subfields is in the above
     form and, analogously to the one-triplet-only version, no whitespace is permitted within
     the colon-and-comma-separated numeric field. Thus '-cpu 0:3,8:11' and '-cpu 0:3:1,8:11:1'
     both specify an 8-threaded run with affinity set to the core quartets 0-3 and 8-11,
     whereas '-cpu 0:3:2,8:11:2' means run 4-threaded on cores 0,2,8,10. As described for the
     -nthread option, it is an error for any core index to exceed the available number of logical
     processor cores.
While the help text shows exponents 2,614,999 to 9,530,803 would be tested with -s m,
what appears in the selftest log file is 39,003,229 to 142,037,359, in mlucas.cfg fft lengths 2048(K) to 7680(K).
Apparently Ernst has adjusted the meaning of m etc. over time to keep up with a moving wavefront,
without maintaining sync in the program's help text output.

Source code Mlucas.c V20.0 appears consistent with selftest:
Code:
class    fftlo(K)  ffthi(K)     plow         phigh
tiny         8       120        173431       2455003
small      128      1920       2614999      36617407
medium    2048      7680      39003229     142037359  (includes DC and first test wavefronts now)
large     8192     61440     152816047    1094833457  (exceeds mersenne.org p < 109 limit)
huge     65536    245760    1154422469    4197433843  (up to ~0.98 * 232)
/* Larger require 64-bit exponent support */
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-08-13 at 20:07
kriesel is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CUDAPm1-specific reference material kriesel kriesel 13 2023-07-17 20:52
CUDALucas-specific reference material kriesel kriesel 10 2023-07-17 14:15
gpuOwL-specific reference material kriesel kriesel 33 2023-03-06 22:59
Mfaktc-specific reference material kriesel kriesel 9 2022-05-15 13:21
Mfakto-specific reference material kriesel kriesel 5 2020-07-02 01:30

All times are UTC. The time now is 21:49.


Fri Sep 29 21:49:38 UTC 2023 up 16 days, 19:31, 0 users, load averages: 1.04, 0.89, 0.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔