![]() |
|
|
#1002 |
|
"Composite as Heck"
Oct 2017
827 Posts |
I converted argument parsing to getopt, should be fully backwards compatible with the current way ops are parsed: https://github.com/sillygitter/gpuowl
Haven't heavily tested it but my Linux build appears to work. Can someone test with MingW and MSVC? MingW and Linux should work as is, MSVC should fail and require you to copy getopt.h from mingw ( https://github.com/skandhurkat/Getop...aster/getopt.h ) into the build directory, and also change #include <getopt.h> to #include "getopt.h" in Args.cpp. Once you get the build working there should be many ways to define the ops. For example -f -fft and --fft should all work and require an argument. I had to take some liberties with the short op letters used as letters c and d are heavily overloaded. Code:
static struct option long_options[]=
{
{"help", no_argument, 0, 'h'},
{"D", required_argument, 0, 'D'},
{"fft", required_argument, 0, 'f'},
{"dump", required_argument, 0, 'k'},
{"user", required_argument, 0, 'u'},
{"cpu", required_argument, 0, 'n'},
{"cl", required_argument, 0, 'C'},
{"time", no_argument, 0, 't'},
{"carry", required_argument, 0, 'c'},
{"block", required_argument, 0, 'b'},
{"device", required_argument, 0, 'd'},
{0, 0, 0, 0}
};
|
|
|
|
|
|
#1003 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Gpuowl V6.2 was benchmarked at many fft lengths on an RX480. Results are tabulated along with summarized results from benchmarks on earlier versions, 3.8, 5.0, 1.9, 2.0. Each offers still some fastest fft lengths, except v2.0's 500K transform is eclipsed by later versions' nearby fft lengths. There were a few fft choices that would not run in V6.2; four cases of EE load error, and an "assertion failed" at very large fft length. At some lengths, earlier versions are from 0.4% to up to 3.4% faster on this gpu. V6.2 was the fastest-version in 9 fft lengths, more than any other version compared. Of those 9, 7 are the -fft +0 option, 2 are the -fft +2 option.
V1.9's power of two fft lengths are still the fastest, 3 out of 3. V3.8 had 5 fastest lengths; V5.0 2 lengths. It sure would be nice if all the fastest transforms were in one recent executable. Benchmark method was as follows. Prepare batch file and worktodo for the version and exponent to be timed, while the gpu runs something else. With minimum execution interruption, make the switch to benchmarking. Ignore the next 10,000+ iterations while the gpu warms back up and stabilizes. Average the time of the following 50,000 iterations (more if the computed average falls on a rounding boundary at 50,000 or the version under test does not display 10,000 iteration timings). Round as appropriate to remove unjustified precision. Some versions were omitted from this comparison. All versions older than V1.9 were omitted. Also omitted were V3.3, uniformly slower than v3.5; 3.5-3.7 previously found to be essentially identical speed to 3.8; 3.9 and 4.x found to be slower. V6.1 was omitted since it has the prime-called-composite bug. V6.0 may be tested at a later time. Timing tests on an RX550 may be run at a later time. Last fiddled with by kriesel on 2019-02-09 at 20:58 |
|
|
|
|
|
#1005 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Logs here show 81885841 is one I ran with gpuowl v5.0-9c13870, PRP/P-1 combined.
Last fiddled with by kriesel on 2019-02-12 at 04:28 |
|
|
|
|
|
#1006 |
|
"Mihai Preda"
Apr 2015
101010110112 Posts |
That is PRP-1. mprime can't double-check them now, but there are only a few such results in total. In the future they might be double-checked when the need arises, but the confidence there is rather high. Anyway, GpuOwl is not generating these anymore.
|
|
|
|
|
|
#1007 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124758 Posts |
Quote:
|
|
|
|
|
|
|
#1008 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
See https://www.mersenneforum.org/showpo...7&postcount=12 for a lookup table of what gpuowl version's files can be run in what other gpuowl versions. A run begun in V1.9 can be migrated to v6.x with two intermediate versions. Back migration in gpuowl is very limited, to the same file version. (To make it otherwise would require fitting older versions with file format back-converters. It's understandable that Mihai has not done so.)
Last fiddled with by kriesel on 2019-02-15 at 18:26 |
|
|
|
|
|
#1009 |
|
"Marv"
May 2009
near the Tannhäuser Gate
12228 Posts |
I was curious about the 2 columns marked H and W when a -list fft is run on V5. For newer versions, they are shown as numbers on the far right. Displaying the log shows what they are when a new exponent is started. Also for V5, "middle 5" is displayed after them.
I couldn't find any reference to these anywhere. TIA |
|
|
|
|
|
#1010 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124758 Posts |
Quote:
Mlucas does differently, with up to 5, if I recall correctly, and possibly more to come. Here's an excerpt from an mlucas.cfg file, with 5, and it appears room to go up to 10: Code:
radices = 36 16 16 32 32 0 0 0 0 0 |
|
|
|
|
|
|
#1011 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
1) I find the aid to memory or explicit labeling useful. It's becoming more useful with age. 2) Without having thought about it much, it seemed to me that requiring 3 parameters or coding in an assumption that if it's two parameters it's B1 given, limits possible future features. Making them labeled and optional individually allows more cases in the future. For -PRP <exponent> <b1bound> <b2bound> all three would be required to be present, or b1 could be specified and b2 left up to the program if omitted, but there's no way to express program select b1 and specify b2, unless -PRP <exponent> 0 <b2bound> is given special meaning. For -PM1 <exponent> [-B1 <b1bound>] [-B2 <b2bound>] [-n 1|2|3] [etc] specify what you like and let the program select the rest seems an easy code evolution path. Some default such as -n 2 could be adopted (how many primality tests are saved by a factor, so input how hard to try to find a factor). 3) Batch files or shell scripts could be compatible as more features become supported. There's existing code in both prime95/mprime and CUDAPm1 for optimizing bounds selection in a program (to save maximal time over many P-1 attempts given varying probabilities of finding factors by P-1 and therefore avoiding primality tests) rather than leaving it to the user to specify. I trust Mihai to consider such matters, make reasonable choices, and occasionally break with the past when it makes sufficient sense. I'm in favor of any idea generation or discussion by us mere users that makes Mihai's time more productive. https://idioms.thefreedictionary.com...+into+a+corner Last fiddled with by kriesel on 2019-02-21 at 19:30 |
|
|
|
|
|
|
#1012 |
|
Sep 2003
5×11×47 Posts |
Recently, PRP triple-checks were done on M78410041 and M79109357.
Unfortunately, they were done with residue type 4, whereas the original two tests were done with residue type 1. And so the result is much less useful, and a new test is needed. Presumably these were manually self-assigned triple checks using gpuOwL? Or were they assigned automatically with some utility program? It's a drawback of different programs having different default residue types, since it places the burden on the user not to make this mistake. PS, These were cases where Simon Cunningham found that his computers were producing bad PRP residues because the implementation of Gerbicz error checking in mprime 29.5 had some shortcomings in the final processing. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |