mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2019-02-15, 18:06   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default gpuowl PRP continuation compatibility; LL

The ability to continue a gpuowl PRP3 primality test in various versions, versus version used to save the interim file, is tabulated along with file version number as given in the file header. All PRP tests for the attachment were made on an RX480 on Win7, many with exponent 77230663.
Early versions, before v0.7, do only LL. There is no continuation compatibility between any LL-only version and any PRP version of gpuowl to my knowledge. Not all LL-only versions are compatible; v0.5 does random offset but v0.6 requires zero offset.

V7.1 is not compatible with v7.0 save files; finish ongoing work in v7.0 (or presumably earlier) before switching to v7.1 (or presumably higher). https://www.mersenneforum.org/showpo...&postcount=110
There appear to be some boundaries across which work in progress can not be migrated.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf gpuowl PRP continuation compatibility.pdf (49.3 KB, 51 views)

Last fiddled with by kriesel on 2021-03-23 at 14:01 Reason: updated attachment, clarified LL to LL-only versions.
kriesel is offline  
Old 2019-02-24, 04:10   #13
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default Validation or verification runs.

One way to test a new version of prime finding software is to rerun tests on known primes.
https://www.mersenneforum.org/showpo...0&postcount=44 lists several such validation runs made for various versions of gpuowl.
See also the attachment at https://www.mersenneforum.org/showpo...83&postcount=8 for verifications/validations of all GIMPS-found Mersenne primes with gpuowl V5.0-9c13870.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 20:39
kriesel is offline  
Old 2019-05-26, 18:47   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

14F316 Posts
Default gpuowl-win V6.5-c48d46f run times on AMD and NVIDIA vs. CUDALucas

Exponents were selected for 4608K fft length current first-test wavefront, and 18432K 100Mdigit timing runs on the available fft options in gpuowl and CUDALucas. Timings should be representative since for a given fft length, they were run as much as possible on the identical gpu and in fastest possible succession, to undergo the same system and environmental influences (temperature in system and ambient temp). (Can't run CUDALucas on AMD, so no comparison data for RX550 or RX480.) Timings tabulated are averages of multiple 20,000-iteration console outputs, after they appeared to have stabilized. These are of consecutive iterations, not the same iteration interval for every timing obtained; the save file was moved from gpu to gpu to obtain cumulative progress toward completion of the test exponents. Due to the large number of iterations timed, and observed stability of timing versus iteration number, there is little timing variation effect from iteration number, believed to be <0.1% based on an overnight run.

Observed speed advantage of gpuowl over CUDALucas on the same gpu unit ranged from 0.2% to 7.5%; average about 2.7%. No cases were observed of CUDALucas being faster on the same fft length and gpu model, in this limited set of exponent cases, except for a very slight 0.3% CUDALucas advantage on a laptop gtx1050Ti.

Note, that some previous versions of gpuowl have shown slightly better timings than these on the AMD gpus testable in those versions. It's possible those were due to environmental differences.

Note, gpuowl has since undergone considerable optimization and is now much faster, with v6.11-380 typically fastest in my incomplete speed sampling. V 7.2-53 also does well. CUDALucas is no longer being maintained and has not kept pace with Gpuowl on the same hardware.


Top of this thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-17 at 14:45 Reason: added gpuowl optimization comment
kriesel is offline  
Old 2019-06-19, 17:55   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default gpuowl residue type etc versus version

There are several PRP residue types. (See https://www.mersenneforum.org/showpo...32&postcount=8) For the res64 to match between two primality tests of the same exponent, the residue types must match, unless perhaps the corresponding mersenne number is prime. (The residues would still not match, if the mersenne number was prime, if one primality test was PRP3 and the other PRP-1)
The PRP residue type(s) supported vary by gpuowl version.
If attempting to double check an existing primality test made by prime95 or other software, a correct gpuowl version must be used to match the residue type of the first test.

Early versions (up through v0.6) implemented the Lucas-Lehmer test instead.

A table of residue type(s) versus gpuowl version and some additional info follows in the pdf attachment.
In a nutshell: If you want a versatile selection of fft lengths and PRP type 4 residues, choose from v4.3 to v6.5-f34ad18. If you want PRP type 1 residues, choose from v6.5-84-30c0508 and more recent, or possibly if they're faster and have suitable fft lengths, v1.1-v3.9.
Available fft lengths versus gpuowl versions are listed at https://www.mersenneforum.org/showpo...36&postcount=9 and some of these also indicate maximum exponent per fft length.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf gpuowl versions and residue types.pdf (13.7 KB, 217 views)

Last fiddled with by kriesel on 2020-04-02 at 18:44 Reason: updated/expanded table, added LL early versions; added v6.5 type 1 commit
kriesel is offline  
Old 2019-08-12, 16:21   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default Gpuowl v6.5-84-g30c0508 -h help output

For a different version of gpuowl, run gpuowl -h or gpuowl-win -h or see the post at the relevant link reachable from http://www.mersenneforum.org/showpos...39&postcount=4, which may include help output.
Built for windows, tried on RX480, -h works, -? doesn't without a worktodo.txt existing.
Code:
>gpuowl-win -h
2019-07-10 10:29:22 gpuowl v6.5-84-g30c0508

Command line options:

-dir <folder>      : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step>        : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound, default 500000
-B2                : P-1 B2 bound, default B1 * 30
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent>    : run a single P-1 test and exit, ignoring worktodo.txt
-results <file>    : name of results file, default 'results.txt'
-iters <N>         : run next PRP test for <N> iterations and exit. Multiple of 10000.
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning).
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

FFT Configurations:
FFT    8K [  0.01M -    0.18M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   64K [  0.10M -    1.34M]  64-512 512-64
FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.91M]  64-256-6
FFT  224K [  0.34M -    4.54M]  64-256-7
FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.81M]  64-256-9
FFT  320K [  0.49M -    6.44M]  64-256-10
FFT  352K [  0.54M -    7.06M]  64-256-11
FFT  384K [  0.59M -    7.69M]  64-256-12 64-512-6
FFT  448K [  0.69M -    8.94M]  64-512-7
FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.42M]  64-512-9
FFT  640K [  0.98M -   12.66M]  64-512-10
FFT  704K [  1.08M -   13.89M]  64-512-11
FFT  768K [  1.18M -   15.12M]  64-512-12 64-1K-6 256-256-6
FFT  896K [  1.38M -   17.57M]  64-1K-7 256-256-7
FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.45M]  64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.88M]  64-1K-10 256-256-10
FFT 1408K [  2.16M -   27.31M]  64-1K-11 256-256-11
FFT 1536K [  2.36M -   29.72M]  64-1K-12 64-2K-6 256-256-12 256-512-6 512-256-6
FFT 1792K [  2.75M -   34.54M]  64-2K-7 256-512-7 512-256-7
FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9
FFT 2560K [  3.93M -   48.90M]  64-2K-10 256-512-10 512-256-10
FFT 2816K [  4.33M -   53.66M]  64-2K-11 256-512-11 512-256-11
FFT    3M [  4.72M -   58.41M]  1K-256-6 64-2K-12 256-512-12 256-1K-6 512-256-12 512-512-6
FFT 3584K [  5.51M -   67.87M]  1K-256-7 256-1K-7 512-512-7
FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9
FFT    5M [  7.86M -   96.07M]  1K-256-10 256-1K-10 512-512-10
FFT 5632K [  8.65M -  105.41M]  1K-256-11 256-1K-11 512-512-11
FFT    6M [  9.44M -  114.74M]  1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6
FFT    7M [ 11.01M -  133.32M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  188.68M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   11M [ 17.30M -  207.02M]  1K-512-11 256-2K-11 512-1K-11 2K-256-11
FFT   12M [ 18.87M -  225.32M]  1K-512-12 1K-1K-6 256-2K-12 512-1K-12 512-2K-6 2K-256-12 2K-512-6 4K-256-6
FFT   14M [ 22.02M -  261.80M]  1K-1K-7 512-2K-7 2K-512-7 4K-256-7
FFT   16M [ 25.17M -  298.13M]  4K-2K
FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  370.44M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   22M [ 34.60M -  406.43M]  1K-1K-11 512-2K-11 2K-512-11 4K-256-11
FFT   24M [ 37.75M -  442.34M]  1K-1K-12 1K-2K-6 512-2K-12 2K-512-12 2K-1K-6 4K-256-12 4K-512-6
FFT   28M [ 44.04M -  513.91M]  1K-2K-7 2K-1K-7 4K-512-7
FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  727.03M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   44M [ 69.21M -  797.64M]  1K-2K-11 2K-1K-11 4K-512-11
FFT   48M [ 75.50M -  868.07M]  1K-2K-12 2K-1K-12 2K-2K-6 4K-512-12 4K-1K-6
FFT   56M [ 88.08M - 1008.44M]  2K-2K-7 4K-1K-7
FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1426.38M]  2K-2K-10 4K-1K-10
FFT   88M [138.41M - 1564.83M]  2K-2K-11 4K-1K-11
FFT   96M [150.99M - 1702.92M]  2K-2K-12 4K-1K-12 4K-2K-6
FFT  112M [176.16M - 1978.12M]  4K-2K-7
FFT  144M [226.49M - 2525.23M]  4K-2K-9
FFT  160M [251.66M - 2797.39M]  4K-2K-10
FFT  176M [276.82M - 3068.76M]  4K-2K-11
FFT  192M [301.99M - 3339.40M]  4K-2K-12
2019-07-10 10:29:30 Exiting because "help"
2019-07-10 10:29:30 Bye

>gpuowl-win -?
2019-07-10 10:29:43 gpuowl v6.5-84-g30c0508
2019-07-10 10:29:43 Note: no config.txt file found
2019-07-10 10:29:43 config: -?
2019-07-10 10:29:43 Can't open 'worktodo.txt' (mode 'rb')
2019-07-10 10:29:43 Bye
Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 20:38
kriesel is offline  
Old 2019-09-17, 00:46   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default Gpuowl P-1 run time scaling on AMD and NVIDIA

Various versions v6.6 and up have been run on a variety of exponents, up to or approaching the limits that are gpu-specific and probably somewhat Gpuowl version specific also. These runs were mostly on Windows 7 Pro X64 systems and with as little as 12 GB system ram. Tesla P100 and K80 sets were done on Google Colaboratory Linux VMs with a Gpuowl build Fan Ming had posted.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf Gpuowl 6.6 P-1 on RX480.pdf (19.7 KB, 180 views)
File Type: pdf Gpuowl 6.7 P-1 on GTX1080Ti.pdf (22.3 KB, 186 views)
File Type: pdf Gpuowl 6.11 P-1 on Tesla P100.pdf (23.3 KB, 165 views)
File Type: pdf Gpuowl 6.11 P-1 on Tesla K80.pdf (24.0 KB, 192 views)
File Type: pdf Gpuowl 6.11 P-1 on Radeon VII.pdf (83.8 KB, 71 views)

Last fiddled with by kriesel on 2021-01-13 at 19:40 Reason: Radeon VII data updated with GHzD/day column, graphs vs. fft length & exponent, note
kriesel is offline  
Old 2019-11-06, 13:16   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default Gerbicz error check detection rate

In a post at the link following, Robert Gerbicz indicates a rate of ~12 ppm missed on errors in a PRP3 computation. That was at a very small p=17. If that rate held constant over the range of mersenne.org 2 to 109, and the error occurrence rate continued at 2% per test, it would correspond to about 12 undetected wrong residues in the approximately 50 million prime exponents (12ppm * 50 million exponents x 2% error occurrence). The rate of undetected error with GEC drops rapidly with p, so is nonzero but very nearly zero in the range of p of interest to GIMPS. (x < 2/(2p-1)) It's very good, far better than the Jacobi symbol check's 50% rate for LL, and either is much better than nothing, but it's not perfect. https://www.mersenneforum.org/showpo...1&postcount=88

And at https://www.mersenneforum.org/showpo...&postcount=104, R. Gerbicz indicates a probability of multiple errors in a block, of ~1/Mp

Also note that errors in code or hardware function outside the Gerbicz check's protection add to the overall error rate, as prime95 found. (The possibilities of human error or deceit add additional possible error in reported residues to Primenet, and these probabilities are hard to quantify and control, other than by independent checking.)

GIMPS policy appears to be that for both LL and PRP tests, double checking of such tests will remain required, except for PRP tests with proof and verification (Cert) performed.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-10-04 at 16:37 Reason: qualified DC, for PRP proof/cert exception
kriesel is offline  
Old 2019-11-09, 13:14   #19
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default Increased throughput with simultaneous runs

See https://www.mersenneforum.org/showpo...8&postcount=99 for results of some experimentation with simultaneous PRP runs or PRP & P-1 tandem runs.
In a nutshell; 107.5 to 108% of the throughput of a single run, 106% power consumption.
The tradeoff is a little more throughput, a little more power efficiency, but nearly double latency. Note, this performance boost may be specific to the Linux ROCm drivers, and much reduced or absent for amdgpupro or Windows. Test in your environment.

Not tested on Radeon VII and Gpuowl, but likely practical, is phasing of two P-1 runs, so that in instance A, stage 1 is occurring, with its low memory requirements, while instance B is running stage 2, with its high memory requirements. At modest exponents it may be possible to run two stage 2's in parallel (simultaneously), as it is possible in CUDAPm1, on gpus with less ram than the Radeon VII's 16GB.

In the months after that, considerable effort has been spent on alternate code paths in gpuowl to optimize speed, so the performance of a single instance has been enhanced and dual-instance throughput advantage has been reduced. Also memlock was added to help make the tandem P-1 more practical/automated. In v7, stage 1 now uses more gpu ram than previously.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-17 at 14:47
kriesel is offline  
Old 2019-11-21, 00:18   #20
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

536310 Posts
Default What's a good P-1 factoring strategy? Best?

Multiple P-1 applications offer the user the option of controlling the P-1 factoring bounds, or require their input, on the command line or in worktodo records or both. What inputs should a user use, depends on the goal. For the purpose of this post, the goal is to maximize time saved on the average, over many exponents, in searching for a new Mersenne prime. Finding factors is nice, but it is not the goal.

Several strategies have been discussed on some forum threads. Some advocate testing with 1-test-saved bounds. Others advocate quickly testing many exponents with very small bounds, claiming it will save time by eliminating some exponents quickly, before running a lengthier factoring with larger bounds. There are multiple bounds given on the mersenne.ca page for a single exponent. What saves the most time in the long run has been unclear, and to my knowledge, not well tested and documented.

I chose two exponents to repeatedly P-1 factor with different bounds. The exponents were chosen to represent near-future P-1 GIMPS wavefront factoring, and 100Mdigit exponents.

Several sets of bounds were used to determine approximately where the number of exponents factored per day is maximized, under the condition that B2 is twice B1. This fits well with gpuowl's minimum bounds having that relationship. Additionally, the tabulated gpu72 bounds, PrimeNet bounds, and 1-test-saved and 2-tests saved bounds were run. The odds of finding a factor were computed for each exponent and bounds combination.

The probable time expenditure in P-1 factoring was computed from the actual logged run times and the odds of finding factors in each run. All cases were premised on trial factoring to gpu72 limits first.

Total number of runs was 12 for the 102M exponent, and 14 for the 333M exponent. This took an RX480 almost ten days to complete, using the then-current commit of gpuowl.

The actual run times and odds of finding a factor for the various cases were combined for 7 different scenarios and 3 GIMPS cases as applicable.
The scenarios are:
  1. Always use 2-tests-saved bounds initially
  2. Use highest factors/day bounds first, use 2-tests bounds on survivors
  3. Use 1-test-saved bounds, PrimeNet reissues for >=2-tests-saved bounds
  4. Use GPU72 bounds, PrimeNet reissues for >=2-tests-saved bounds
  5. Always use 1-test-saved bounds initially
  6. Use highest factors/day bounds first, use 1-test bounds on survivors
  7. Use PrimeNet bounds
The GIMPS cases are:
A: No further LL is done, and a single PRP returning composite is regarded as definitive (eg with proof generation & certification), so future work would consist entirely of single primality tests per exponent. This is a hypothetical or future case. There is still a lot of LL first test, or PRP without proof, being performed. A single LL or PRP (without proof generation) does not address the issue of honest error or false reports.
B: All future primality tests will be double checked, whether LL or PRP, so future work would consist entirely of double primality tests per exponent. This is I think the closest to the 2020 situation for GIMPS. (Only about 1/4 of primality test results were PRP with proof in September 2020.)
C: An equal probability of single or double primality tests going forward, the midpoint case between A and B. This would correspond to PRP being single tested and LL double tested and each occurring at about the 2020 rate.

There are a few scenario and case combinations that don't make sense. This leaves 17 combinations for each exponent.

The optimal time savings for the GIMPS cases were computed to be:
A: All single primality tests: use the 1-test-saved bounds

B: The 2020 situation, 2 primality tests: use the PrimeNet bounds

C: Equal probability of single or double primality tests: use the 1-test-saved bounds

The max-factors/day first scenario was observed to be less effective in all cases, both exponents. Possibly it would do somewhat better if run with a different B2/B1 ratio. The other extreme is the ratio of bounds for the 1-test-saved, 2-test-saved, gpu72, or PrimeNet scenarios, which is typically about 20 to 30.

I concluded that the proper P-1 bounds to use now in Gpuowl are the larger mersenne.ca (formerly PrimeNet, now GPU72 row) bounds, immediately (only). Without any prior P-1; no 1-test-saved first, no max factors/day low bounds first. For example, https://www.mersenne.ca/exponent/104399917

Note, that these test runs were made before the latest performance advances in Gpuowl.

The actual bounds, odds, and calculations are documented in the attached pdf.

After these tests were done, the PrimeNet and gpu72 bounds on mersenne.ca were revised. Formerly the PrimeNet bounds were higher for a given exponent. After the revision the gpu72 line bounds are higher. After PRP with proof adoption is substantially complete, downward adjustment of bounds is likely.

See also M344587487's post re efficiently and effectively performing P-1:
https://mersenneforum.org/showpost.p...2&postcount=10


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf P-1 bounds selection in Gpuowl.pdf (25.3 KB, 178 views)

Last fiddled with by kriesel on 2021-06-17 at 18:43 Reason: added proof generation consideration, updated for dates
kriesel is offline  
Old 2019-12-09, 16:13   #21
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default Compiling gpuowl

First ask yourself, is this compile really necessary? See https://www.mersenneforum.org/showpo...83&postcount=7 or https://download.mersenne.ca/gpuowl which also have older versions precompiled for Windows. These usually have the executable, readme file, and help output bundled into a zip file. And there is occasionally a Linux version available, such as in the Google Colab thread.

Requirements for recent github commits of gpuowl:
  • 64-bit Intel or AMD system
  • OpenCL v2.0 support on system and gpu (very old commits, years old, may only require v1.2)
  • AMD APP SDK 3.0 (see https://community.amd.com/thread/227948 for a source for a legal copy and the license agreement)
  • Linux or Linux-like environment, including gcc and make
git is not required, but it makes the job easier. The alternative is make a subfolder, download the zip file, unpack it in the subfolder, and this may also not get the version string embedded in the executable right without some further manual intervention.
I've taken the conservative approach of using git clone and saving every build in a separate folder, which gets named for the version and commit, so I can run any version at any time. If you only want the latest, substitute git pull. There's a handy basic intro to git at https://www.mersenneforum.org/showpo...postcount=1076

The following procedures are for relatively recent commits, since Preda et al incorporated multiple build targets into the Gpuowl makefile addressing both Linux builds and msys2/mingw Windows builds around v6.7 as I recall.

There are two steps; preparing an adequate build environment, and performing the build.

On a local system
  • Linux
    • build environment setup (draft, unchecked, some steps may be unnecessary if the listed items are already installed and reasonably current)
      1. install gcc
      2. install make
      3. install git
      4. install AMD APP SDK 3.0
      5. modify path
      6. create the folder within which builds will be performed
  • Windows
    • build environment setup (done rarely) borrowing liberally from https://www.mersenneforum.org/showpost.php?p=483209&postcount=356
      1. install msys2/mingw64 (see https://www.msys2.org/)
      2. close your other apps before the next step, because I've seen it do an unannounced unpreventable system shutdown/restart
      3. install the 64-bit AMD APP SDK 3.0
      4. Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64 to C:\msys64\mingw64\lib
      5. Copy the contents of C:\Program Files (x86)\AMD APP SDK\3.0\include to C:\msys64\mingw64\include
      6. run msys2 for the following setup steps (some steps may be unnecessary if the listed items are already installed and reasonably current)
      7. install gcc: pacman -S mingw-w64-x86_64-gcc
      8. install make
      9. install git
      10. modify path
        Code:
        #PATH=$PATH\:/dir/path ; export PATH
         PATH=$PATH\:c/msys64/usr/bin ; export PATH
      11. create the folder within which builds will be performed
    • the compile and link
      1. run msys2 for the following build steps
      2. cd to the build folder
      3. git clone https://github.com/preda/gpuowl
      4. cd to the gpuowl subfolder that git clone just created
      5. make gpuowl-win.exe
See also https://www.mersenneforum.org/showpo...6&postcount=40, https://www.mersenneforum.org/showth...t=msys2&page=4

I generally create a folder for each version and commit, eg gpuowl-v6.11-9-g9ae3189.
After an executable is produced, I can drag the executable and the readme up to that folder and have a relatively empty working folder for test, while all the source and .o files sit in the .\gpuowl subfolder below.

I follow a fresh build with gpuowl-win -help and save that output. A nice feature is it shows which OpenCL GPUs it detected.

On cloud computing, there are at least three approaches
  1. Follow the Linux instructions, creating the build environment and compile for every new session. You'll always have the latest commit, including when there's a serious bug, and it will be less efficient use of the cloud computing time.
  2. Follow the Linux instructions once, creating the build environment and compile. Then save the compiled executable on persistent cloud storage that it can be copied back from for future sessions, such as a Google drive dedicated to your GIMPS use. You'll have a stable version, until you decide to upgrade by repeating the build steps, and can save and choose among multiple versions. See also https://www.mersenneforum.org/showpo...30&postcount=7
  3. Instead of doing a build, get one from someone else. Store it on the Google drive that will be used during runs.
On Windows, check the right build environment was used; it will fail in a Windows command prompt for the same makefile and commit that will succeed in a MSYS2/MINGW64.

If there's a failure to build, use git bisect to determine at which commit the issue began, and include that info in any issue report provided to Preda. https://git-scm.com/docs/git-bisect


Thanks to kracker, Preda, tServo, and others who helped get me started or fix the occasional broken build environment.


Edit 2021 March 09: None of the above should be mistaken for advocacy of blind adherence to "latest rev is always best". When it is necessary or useful to build or revert to an older commit, that is not available precompiled for the OS and version/commit desired from existing locations, for successful build, stability, features no longer available in the latest commit(s), to avoid speed regression, for comparison testing, or other reasons, a review of https://stackoverflow.com/questions/...ote-repository may be useful in determining how to modify the above build processes to build the desired commit of the desired branch.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-09 at 23:05 Reason: added "Edit 2021 March 09 paragraph" re building other than latest commit
kriesel is offline  
Old 2020-05-28, 22:07   #22
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default Save file size versus exponent or fft length

I think file sizes will be very nearly the same for PRP3, LL, and stage 1 P-1, since they are all going to be storing residues mod Mp.
Attached is observed PRP3 file size except as noted, plus some linear fits and extrapolations.


Top of this reference thread: https://www.mersenneforum.org/showthread.php?t=23391
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf file size versus exponent.pdf (13.5 KB, 152 views)

Last fiddled with by kriesel on 2020-05-28 at 22:08
kriesel is offline  
Closed Thread



Similar Threads
Thread Thread Starter Forum Replies Last Post
Reference material discussion thread kriesel kriesel 78 2021-07-12 13:51
Mersenne Prime mostly-GPU Computing reference material kriesel kriesel 31 2020-07-09 14:04
CUDALucas-specific reference material kriesel kriesel 9 2020-05-28 23:32
Mfaktc-specific reference material kriesel kriesel 8 2020-04-17 03:50
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 23:08.


Fri Jul 16 23:08:35 UTC 2021 up 49 days, 20:55, 1 user, load averages: 1.35, 1.47, 1.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.