"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3·1,229 Posts
|
Mlucas V20.0 -h help output
./Mlucas -h produces lesser output and including an error message. As a workaround, use ./Mlucas -h printall
Info portion will vary depending on the system it is run upon.
There does not appear to be any P-1-specific help output available at this time.
Code:
~/mlucas_v20/obj$ ./Mlucas -h printall
Mlucas 20.0
http://www.mersenneforum.org/mayer/README.html
INFO: testing qfloat routines...
System total RAM = 16243, free RAM = 287
INFO: 287 MB of free system RAM detected; will use up to 90% = 258 MB of that, unless user specifies a lower fraction via -maxalloc.
CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 7.4.0.
INFO: Build uses AVX2 instruction set.
INFO: Using inline-macro form of MUL_LOHI64.
INFO: Using FMADD-based 100-bit modmul routines for factoring.
INFO: MLUCAS_PATH is set to ""
INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
Setting DAT_BITS = 10, PAD_BITS = 2
INFO: testing IMUL routines...
INFO: System has 12 available processor cores.
INFO: testing FFT radix tables...
For the full list of command line options, run the program with the -h flag.
For a list of command-line options grouped by type, run the program with the -topic flag.
Mlucas command line options:
Symbol and abbreviation key:
<CR> : carriage return
| : separator for one-of-the-following multiple-choice menus
[] : encloses optional arguments
{} : denotes user-supplied numerical arguments of the type noted.
({int} means nonnegative integer, {+int} = positive int, {float} = float.)
-argument : Vertical stacking indicates argument short 'nickname' options,
-arg : e.g. in this example '-arg' can be used in place of '-argument'.
Supported arguments:
<CR> Default mode: looks for a worktodo.ini file in the local
directory; if none found, prompts for manual keyboard entry
Help submenus by topic. No additional arguments may follow the displayed ones:
-s Post-build self-testing for various FFT-length rnages.
-fft[len] FFT-length setting.
-radset FFT radix-set specification.
-m[ersenne] Mersenne-number primality testing.
-f[ermat] Fermat-number primality testing.
-shift ***SIMD builds only*** Number of bits by which to shift the initial seed (= iteration-0 residue).
-prp Probable-primality testing mode.
-iters Iteration-number setting.
-nthread|cpu Setting threadcount and CPU core affinity.
-maxalloc Setting maximum-percentage of available system RAM to use per instance.
*** NOTE: *** The following self-test options will cause an mlucas.cfg file containing
the optimal FFT radix set for the runlength(s) tested to be created (if one did not
exist previously) or appended (if one did) with new timing data. Such a file-write is
triggered by each complete set of FFT radices available at a given FFT length being
tested, i.e. by a self-test without a user-specified -radset argument.
(A user-specific Mersenne exponent may be supplied via the -m flag; if none is specified,
the program will use the largest permissible exponent for the given FFT length, based on
its internal length-setting algorithm). The user must specify the number of iterations for
the self-test via the -iters flag; while it is not required, it is strongly recommended to
stick to one of the standard timing-test values of -iters = [100,1000,10000], with the larger
values being preferred for multithreaded timing tests, in order to assure a decently large
slice of CPU time. Similarly, it is recommended to not use the -m flag for such tests, unless
roundoff error levels on a given compute platform are such that the default exponent at one or
more FFT lengths of interest prevents a reasonable sampling of available radix sets at same.
If the user lets the program set the exponent and uses one of the aforementioned standard
self-test iteration counts, the resulting best-timing FFT radix set will only be written to the
resulting mlucas.cfg file if the timing-test result matches the internally- stored precomputed
one for the given default exponent at the iteration count in question, with eligible radix sets
consisting of those for which the roundoff error remains below an acceptable threshold.
If the user instead specifies the exponent (only allowed for a single-FFT-length timing test)****************
and/or a non-default iteration number, the resulting best-timing FFT radix set will only be
written to the resulting mlucas.cfg file if the timing-test results match each other? ********* check logic here *******
This is important for tuning code parameters to your particular platform.
FOR BEST RESULTS, RUN ANY SELF-TESTS UNDER ZERO- OR CONSTANT-LOAD CONDITIONS
-s {...} Self-test, user must also supply exponent [via -m or -f] and/or FFT length to use.
-s tiny Runs 100-iteration self-tests on set of 32 Mersenne exponents, ranging from 173431 to 2455003
-s t This will take around 1 minute on a fast CPU..
-s small Runs 100-iteration self-tests on set of 32 Mersenne exponents, ranging from 173431 to 2455003
-s s This will take around 10 minutes on a fast CPU..
**** THIS IS THE ONLY SELF-TEST ORDINARY USERS ARE RECOMMENDED TO DO: ******
* *
* -s medium Runs set of 16 Mersenne exponents, ranging from 2614999 to 9530803
* -s m This will take around an hour on a fast CPU. *
* *
****************************************************************************
-s large Runs set of 24 Mersenne exponents, ranging from 10151971 to 72123137
-s l This will take around an hour on a fast CPU.
-s huge Runs set of 16 Mersenne exponents, ranging from 76821337 to 282508657
-s h This will take a couple of hours on a fast CPU.
-s all Runs 100-iteration self-tests of all test Mersenne exponents and all FFT radix sets.
-s a This will take several hours on a fast CPU.
-fft[len] {+int} If {+int} is one of the available FFT lengths (in Kilodoubles), runs all
all available FFT radices available at that length, unless the -radset flag is
invoked (see below for details). If -fft is invoked without the -iters flag,
it is assumed the user wishes to do a production run with a non-default FFT length,
In this case the program requires a valid worktodo.ini-file entry with exponent
not more than 5% larger than the default maximum for that FFT length.
If -fft is invoked with a user-supplied value of -iters but without a
user-supplied exponent, the program will do the specified number of iterations
using the default self-test Mersenne or Fermat exponent for that FFT length.
If -fft is invoked with a user-supplied value of -iters and either the
-m or -f flag and a user-supplied exponent, the program will do the specified
number of iterations of either the Lucas-Lehmer test with starting value 4 (-m)
or the Pe'pin test with starting value 3 (-f) on the user-specified modulus.
In either of the latter 2 cases, the program will produce a cfg-file entry based
on the timing results, assuming at least one radix set ran the specified #iters
to completion without suffering a fatal error of some kind.
Use this to find the optimal radix set for a single FFT length on your hardware.
NOTE: IF YOU USE OTHER THAN THE DEFAULT MODULUS OR #ITERS FOR SUCH A SINGLE-FFT-
LENGTH TIMING TEST, IT IS UP TO YOU TO MANUALLY VERIFY THAT THE RESIDUES OUTPUT
MATCH FOR ALL FFT RADIX COMBINATIONS AND THE ROUNDOFF ERRORS ARE REASONABLE!
-radset {int} Specific index of a set of complex FFT radices to use, based on the big
select table in the function get_fft_radices(). Requires a supported value of
-fft to also be specified, as well as a value of -iters for the timing test.
-m [{+int}] Performs a Lucas-Lehmer primality test of the Mersenne number M(int) = 2^int - 1,
where int must be an odd prime. If -iters is also invoked, this indicates a timing test.
and requires suitable added arguments (-fft and, optionally, -radset) to be supplied.
If the -fft option (and optionally -radset) is also invoked but -iters is not, the
program first checks the first line of the worktodo.ini file to see if the assignment
specified there is a Lucas-Lehmer test with the same exponent as specified via the -m
argument. If so, the -fft argument is treated as a user override of the default FFT
length for the exponent. If -radset is also invoked, this is similarly treated as a user-
specified radix set for the user-set FFT length; otherwise the program will use the cfg file
to select the radix set to be used for the user-forced FFT length.
If the worktodo.ini file entry does not match the -m value, a set of timing self-tests is
run on the user-specified Mersenne number using all sets of FFT radices available at the
specified FFT length.
If the -fft option is not invoked, the self-tests use all sets of
FFT radices available at that exponent's default FFT length.
Use this to find the optimal radix set for a single given Mersenne number
exponent on your hardware, similarly to the -fft option.
Performs as many iterations as specified via the -iters flag [required].
-f {int} Performs a base-3 Pe'pin test on the Fermat number F(num) = 2^(2^num) + 1.
If desired this can be invoked together with the -fft option.
as for the Mersenne-number self-tests (see notes about the -m flag;
note that not all FFT lengths supported for -m are available for -f).
Optimal radix sets and timings are written to a fermat.cfg file.
Performs as many iterations as specified via the -iters flag [required].
-shift ***SIMD builds only*** Bits by which to circular-left-shift the initial seed.
This shift count is doubled (modulo the number of bits of the modulus being tested)
each iteration. Savefile residues are rightward-shifted by the current shift count
before being written to the file; thus savefiles contain the unshifted residue, and
separately the current shift count, which the program uses to leftward-shift the
savefile residue when the program is restarted from interrupt.
The shift count is a 64-bit unsigned int (e.g. to accommodate Fermat numbers > F32).
-prp {int} Instead of running the rigorous primality test defined for the modulus type
in question (Lucas-Lehmer test for Mersenne numbers, Pe'pin test for Fermat numbers
do a probably-primality test to the specified integer base b = {int}.
For a Mersenne number M(p), starting with initial seed x = b (which must not = 2
or a power of 2), this means do a Fermat-PRP test, consisting of (p-2) iterations of
form x = b*x^2 (mod M(p)) plus a final mod-squaring x = x^2 (mod M(p)), with M(p) being
a probable-prime to base b if the result == 1.
For a Fermat number F(m), starting with initial seed x = b (which must not = 2
or a power of 2), this means do an Euler-PRP test (referred to as a Pe'pin test for these
moduli), i.e. do 2^m-1 iterations of form x = b*x^2 (mod F(m)), with F(m) being not merely
a probable prime but in fact deterministically a prime if the result == -1. The reason we
still use the -prp flag in the Fermat case is for legacy-code compatibility: All pre-v18
Mlucas versions supported only Pe'pin testing to base b = 3; now the user can use the -prp
flag with a suitable base-value to override this default choice of base.
-iters {int} Do {int} self-test iterations of the type determined by the
modulus-related options (-s/-m = Lucas-Lehmer test iterations with
initial seed 4, -f = Pe'pin-test squarings with initial seed 3.
-maxalloc {int} Maximum-percentage of available system RAM to use per instance. Must be in [10,90], default = 90.
-nthread {int} For multithread-enabled builds, run with this many threads.
If the user does not specify a thread count, the default is to run single-threaded
with that thread's affinity set to logical core 0.
AFFINITY: The code will attempt to set the affinity of the resulting threads
0:n-1 to the same-indexed processor cores - whether this means distinct physical
cores is entirely up to the CPU vendor - E.g. Intel uses such a numbering scheme
but AMD does not. For this reason as of v17 this option is deprecated in favor of
the -cpu flag, whose usage is detailed below, with the online README page providing
guidance for the core-numbering schemes of popular CPU vendors.
If n exceeds the available number of logical processor cores (call it #cpu), the
program will halt with an error message.
For greater control over affinity setting, use the -cpu option, which supports two
distinct core-specification syntaxes (which may be mixed together), as follows:
-cpu {lo[:hi[:incr]]} (All args {int} here) Set thread/CPU affinity.
NOTE: This flag and -nthread are mutually exclusive: If -cpu is used, the threadcount
is inferred from the numeric-argument-triplet which follows. If only the 'lo' argument
of the triplet is supplied, this means 'run single-threaded with affinity to CPU {lo}.'
If the increment (third) argument of the triplet is omitted, it is taken as incr = 1.
The CPU set encoded by the integer-triplet argument to -cpu corresponds to the
values of the integer loop index i in the C-loop for(i = lo; i <= hi; i += incr),
excluding the loop-exit value of i. Thus '-cpu 0:3' and '-cpu 0:3:1' are both
exactly equivalent to '-nthread 4', whereas '-cpu 0:6:2' and '-cpu 0:7:2' both
specify affinity setting to cores 0,2,4,6, assuming said cores exist.
Lastly, note that no whitespace is permitted within the colon-separated numeric field.
-cpu {triplet0[,triplet1,...]} This is simply an extended version of the above affinity-
setting syntax in which each of the comma-separated 'triplet' subfields is in the above
form and, analogously to the one-triplet-only version, no whitespace is permitted within
the colon-and-comma-separated numeric field. Thus '-cpu 0:3,8:11' and '-cpu 0:3:1,8:11:1'
both specify an 8-threaded run with affinity set to the core quartets 0-3 and 8-11,
whereas '-cpu 0:3:2,8:11:2' means run 4-threaded on cores 0,2,8,10. As described for the
-nthread option, it is an error for any core index to exceed the available number of logical
processor cores.
While the help text shows exponents 2,614,999 to 9,530,803 would be tested with -s m,
what appears in the selftest log file is 39,003,229 to 142,037,359, in mlucas.cfg fft lengths 2048(K) to 7680(K).
Apparently Ernst has adjusted the meaning of m etc. over time to keep up with a moving wavefront,
without maintaining sync in the program's help text output.
Source code Mlucas.c V20.0 appears consistent with selftest:
Code:
class fftlo(K) ffthi(K) plow phigh
tiny 8 120 173431 2455003
small 128 1920 2614999 36617407
medium 2048 7680 39003229 142037359 (includes DC and first test wavefronts now)
large 8192 61440 152816047 1094833457 (exceeds mersenne.org p < 109 limit)
huge 65536 245760 1154422469 4197433843 (up to ~0.98 * 232)
/* Larger require 64-bit exponent support */
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Last fiddled with by kriesel on 2021-08-13 at 20:07
|