mersenneforum.org prime95-specific reference material
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2018-12-14, 19:51 #1 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 5,233 Posts prime95-specific reference material This thread is here for comparison to the gpu-based applications. Please use the reference discussion thread https://www.mersenneforum.org/showthread.php?t=23383 to make comments or suggestions. Mprime and prime95 are Intel-compatible-processor-specific. Older processor models will have limited if any support. For ARM and other not-Intel-compatible cpus, see mlucas. Stable version: There is a version of v30.3 (build 6), and for MacOS v29.8 build 7, available at the location for what's considered stable, https://www.mersenne.org/download/. V30.3 and later are PRP-proof capable versions, which greatly reduce the effort of verification of a primality test, but will require considerably more disk space to accomplish that. See the readme.txt and other documentation included in the compressed distribution file, for more info on that. Older versions: There are also older versions v29.8b6 and older, for legacy operating systems, available at https://www.mersenne.org/download/. Newer versions: Generally George leaves the stable version on the download page while newer versions are available and in active development or testing. There is typically a thread for that activity, including occasional announcements of / links to new builds. Currently the development / test version is v30.6, which adds P+1 factoring, under active test, now at build 4 at https://mersenneforum.org/showpost.p...&postcount=256 For some background on P+1 factoring, see https://mersenneforum.org/showthread.php?t=26700 or https://en.wikipedia.org/wiki/Willia...2B_1_algorithm Note that P+1 factoring is projected to be not productive enough to be useful in GIMPS wavefront work. Stay with P-1 factoring to established bounds guidelines. Setup instructions are included at https://www.mersenne.org/download/. Follow with https://www.mersenne.org/gettingstarted/ One thing to avoid is installing into "Program Files" or other restricted directories. Permissions problems will follow with such errant installs. Making a separate working directory for prime95 under the user's home directory is the way to go. I strongly recommend benchmarking over the range of fft lengths expected to be used, analyzing the results in a spreadsheet, and configuring for best throughput that is consistent with latencies shorter than applicable expiration periods. Configure worker windows for your preferred work type, and make sure that trial factoring is not it; GPUs are far more effective at that. It is normal for the PrimeNet server to issue a new prime95 installation only LL DC, until each prime95/mprime worker has completed 4 LL DC successfully. After a new installation accumulates a history of reliability, the PrimeNet server will allow additional work types. For remaining questions see the program's extensive included documentation. PRP run time scaling for low p https://www.mersenneforum.org/showpo...78&postcount=2 P-1 run time scaling https://www.mersenneforum.org/showpo...92&postcount=3 Effect of number of workers https://www.mersenneforum.org/showpo...18&postcount=4 Effect of number of workers (continued) https://www.mersenneforum.org/showpo...19&postcount=5 Effect of frequent interim residue output https://www.mersenneforum.org/showpo...44&postcount=6 Prime95 documentation https://www.mersenneforum.org/showpo...03&postcount=7 Prime95 exponent limits https://www.mersenneforum.org/showpo...74&postcount=8 PRP proof capable versions https://www.mersenneforum.org/showpo...35&postcount=9 Performing version upgrades https://www.mersenneforum.org/showpo...2&postcount=10 Effect of number of workers continued 2 https://www.mersenneforum.org/showpo...4&postcount=11 See also the Concepts in GIMPS Trial Factoring post at https://www.mersenneforum.org/showpo...23&postcount=6 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2021-05-07 at 15:32 Reason: updated link to latest build
2018-12-14, 19:54   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,233 Posts
PRP run time scaling for low p

Run time is fitted as approximately proportional to p2.094, for 86243 <= p <= 2976221. LL run time is expected to scale very similarly. For comparison a theoretical fft convolution based primality tester scales as p2 log p log log p, which over the mersenne.org interval fits as p2.117. Overhead at low exponents lowers the power on a fit.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 prp run times low Mp.pdf (15.9 KB, 253 views)

Last fiddled with by kriesel on 2019-11-18 at 14:30

2018-12-24, 22:06   #3
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,233 Posts
Prime95 P-1 run time scaling

A small number of widely spaced exponents were run to observe the run time scaling.

For prime95 v29.4b8 x64 run on a Windows 7 x64 system with dual e5-2670 chips, 4 cores (half a chip package) per worker, 32,000 MB allowance per worker, run time was approximately proportional to exponent p2.33 up to 595M (27 days), a somewhat higher power than observed for P-1 on gpus (~2.1).

Another prime95 v29.4b8 x64 run on an FMA equipped i7-7500U Windows 10 X64 system seemed to be taking inordinately long to perform P-1, at p=101M, on 7,200 MB memory allowed, one core. It had been running for two weeks to perform stage 1 and reach 90% in stage 2. It appeared to be paging to disk excessively. The same system can complete an 83M primality test per core in about 2.5 weeks. It was allowed to complete that P-1 and then reset to 4096M memory allowed, after it was found to still page excessively at 6144M. This is a system with 8GB ram currently. In all cases it was running 1 core per worker; the other worker was running an 83M LL. It projected P-1 run times ranging from 4.4 days for 201M to 43 days for 605M, 67 days for 701M. However, attempting 605M resulted in "Cannot initialize FFT code, errcode=1002".
The fit to observed run time is p2.087 (with five data points).

Another run, a mix of prime95 V29.7b1, v29.8b3, and v29.8b6, on an FMA equipped i7-8750H Windows 10 X64 system was able to run 801M (at 8GB allocated of its 16GB installed ram, 37 days run time), and 901M (at 12GB allocated, 57 days run time) and is expected to be capable of up to 920.8M. The offset in the estimated days runtime is believed to be due to whether mfakto is running on the Intel igp or not. It seems to be using somewhat lower bounds than GPU72 figures for exponents above p~400M.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 p-1 run time scaling e5-2670.pdf (14.1 KB, 190 views) p-1 run time scaling FMA i7-7500u.pdf (16.1 KB, 195 views) p-1 run time scaling FMA i7-8750H.pdf (18.4 KB, 194 views)

Last fiddled with by kriesel on 2020-01-05 at 14:05 Reason: updated i7-8750h attachment for new data

2018-12-28, 18:13   #4
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,233 Posts
Effect of number of workers

Similar to the number of threads choices in gpu applications, on multicore systems, the effect of number of cores per worker in prime95 is unpredictable, and so there is provision for benchmarking.

Number of workers could be chosen to optimize performance. But which measure of performance? Aggregate throughput maximized, latency of one assignment minimized, number of joules used for a 100GhzD primality test, aggregate throughput given a constraint of latency low enough to avoid assignment expiration, something else? For which single fft length, or for the current and next several?

For minimum latency, as for confirming a newly discovered Mersenne prime, Madpoo has run experiments on a dual-14-core system. He reported the fastest primality test time around 20 cores out of the 28 available; any more than 6 on the lesser use package, and the increased package to package data transfers slow the progress.

For picking number of cores/worker per cpu type, that's a reasonable compromise for maximum aggregate throughput, so I can set it and forget it for months or years on each system, I ran the built in prime95 benchmarking over wide fft ranges for a variety of cores/worker, on a variety of cpu types. Then the timings were tabulated in spreadsheets and graphed.

If going after the maximum performance per fft length, consider that some work types restart from the beginning when the number of workers is changed. Read the readme.txt and other files, back up before changing number of workers, plan ahead, etc.

Some patterns emerge. Worker counts that would straddle the divide between processor packages if divided evenly typically do not provide as much throughput. A 12-core 2-package system with 3 workers with equal cores/worker would have at least one worker with cores in each package (4 2 + 2 4). George indicates recent versions of prime95 prevent the straddle by assigning unequal numbers of cores to the workers.
For larger core counts there can be quite a few choices to evaluate. What's fastest for one fft length may not be for others. A compromise that averages a small percentage penalty is usually available. Plotting the various combinations with trend lines seems a useful visualization method for selecting one configuration to run with for a long time.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 2-core Core 2 duo e8200 eagle performance.pdf (58.0 KB, 40 views) 2-core i3-370M parrot performance.pdf (57.6 KB, 42 views) dual 4-core e5520 lenovo performance.pdf (57.7 KB, 36 views) dual 6-core x5650 condorette performance.pdf (60.3 KB, 39 views) dual 6-core E5645 performance.pdf (31.9 KB, 34 views)

Last fiddled with by kriesel on 2021-03-17 at 17:44

2018-12-28, 18:16   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

147116 Posts
Effect of number of workers continued

Working around the 5-attachment limit per post:

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
 dual 8-core e5-2670 emu performance.pdf (59.6 KB, 39 views) 4-core i7-4790 asrock performance.pdf (67.5 KB, 40 views) dual-12-core e5-2697v2 roa performance.pdf (145.8 KB, 32 views) i3-8121u nuc performance.pdf (121.3 KB, 38 views) i5-1035g1 performance.pdf (128.2 KB, 40 views)

Last fiddled with by kriesel on 2021-03-17 at 18:07 Reason: some updated to include max exponent and latency

2019-03-14, 03:49   #6
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,233 Posts
Effect of frequent Res64 output

Timing runs on LL DC on the same 51M exponent and old 32-bit hardware with prime95 29.4b7 yield conflicting information on the cost of a Res64 output as a multiple of an ordinary iteration. The res64 cost is estimated as 7/8 to 4 times an iteration. Note that because of numbering skew between prime95 and other conventions, prime95 outputs res64 at 3 successive iterations, with cost ~3.1 to 12 times an iteration. The lower value is based on prime95-provided timings per iteration, the higher value on prime95-provided time stamp of 1 second resolution of the res64 output line.

An initial attempt to make a similar measurement on an i7-8750H with UHD630 igp in prime95 v29.4b8 x64 yielded negative per-res64 cost in two tries. I speculate this was an interaction with mfakto running at the same time on the same chip package power budget. Performance monitor indicates the cpu utilization drops considerably when frequent interim residue output is enabled.

A retest, with the UHD630 mfakto instance halted, yielded timings that indicate a cost per PRP3 res64 interim output on the i7-8750H system of 2.7 seconds, equivalent to 263. iterations, on an 83M primality test. One of the 6 cores stays very busy while the rest are only used at a low duty cycle when outputting an interim residue every 10 iterations. This cut throughput from 96.6 iter/sec to 3.54 iter/sec, a rather severe 96.3% reduction. The estimated effect on run time for the exponent when producing interim residues for the primenet server at 5,000,000 iteration intervals is about 45 seconds, 52ppm of run time. The retest was brief, taking 48 seconds for iterations with interim residues, and 114 seconds without, so accuracy is no better than a percent or two. Note also the cpu clock was not held constant during the test. In this case the agreement between time stamp based rates and program-computed ms/iter was very good, ~1/4%.

Another test, on a dual-xeon-e5-2690 system, v29.6b6 x64 on Win10, 4 cores/worker, 83.9M PRP tests, gave ~305 iterations/interim residue64, 3.45 sec/interim residue, or around 61ppm for the default 5,000,000 iteration interval. The preceding figure ignores the initial 500K-iteration interim residue, which raises the impact a bit to 65ppm for ~84M exponents, and somewhat more for DC exponents.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Thumbnails

Attached Files
 res64 timing for prime95.pdf (12.1 KB, 214 views) res64 timing for prime95 v29.4b8 i7-8750h.pdf (13.1 KB, 207 views) res64 timing for prime95 v29.6b6 e5-2690.pdf (11.8 KB, 210 views)

Last fiddled with by kriesel on 2019-11-18 at 14:31

 2019-08-12, 17:22 #7 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 121618 Posts Prime95 documentation Most GIMPS applications include a readme file. Prime95 has very comprehensive documentation included in the zip package, in multiple files. License.txt for the license terms Readme.txt for the new user and periodic reference Whatsnew.txt particularly useful when upgrading Stress.txt relating to stress testing and reliability testing Undoc.txt documentation of the perhaps less frequently used options Read them early and often. Like for the other applications, reading the documentation again after additional experience with the program is useful. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2019-11-18 at 14:32
2020-05-25, 00:07   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

147116 Posts
Prime95 exponent limits

Prime95 and its sibling mprime contain many code paths specific to processor types and exponent magnitudes. What range of exponents is supported varies by processor type. I think what has been implemented was determined by a combination of processor throughput versus exponent size and decisions by George on which to spend his programming time.

There are several ways to determine what these limits are.
George has made statements about them in email or on the forum.
https://mersenneforum.org/showpost.p...&postcount=219

The whatsnew.txt describes numerous changes in what was supported.

The source code is available for examination.

Trying runs on differing hardware and OS may obscure the situation, because it could be that it's an old operating system version, not the processor type, that prevents running some versions of code.
Attached Files
 processor specific summary.txt (250 Bytes, 199 views) exponent limits versus hardware.txt (278 Bytes, 182 views) prime95 exponent and fft length limits vs processor type from mult.asm source.txt (2.5 KB, 194 views) prime95 fft highlights of whatsnew.txt (7.0 KB, 202 views)

Last fiddled with by kriesel on 2020-05-25 at 00:21

 2020-07-27, 16:03 #9 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 5,233 Posts PRP proof capable versions UPDATE: V30.3b6 is now generally available. This automatically uploads proof files and includes resource limit features. Direct download links for prime95 64-bit for Windows; mprime 64-bit for Linux. (32 bit and other variations also available.) V30.3b6 appears on the main GIMPS software page and the mersenne.ca download mirror. Previously: Per https://www.mersenneforum.org/showpo...&postcount=119 V30.1b1 prime95 or mprime are available and require manual uploading of proof files. Direct download from dropbox: prime95 for Windows 64-bit; mprime for Linux 64-bit A run of PRP with proof becomes conspicuous by its multi-gigabyte p.residues file. These downloads contain all the necessary code includiing dll files. (V30.2b1 DID NOT contain the dlls. Install v30.1b1 first, then v30.2b1 atop it.) The standalone command-line uploader, which works for gpuowl as well as prime95, is described briefly at https://www.mersenneforum.org/showpo...&postcount=154 but the direct download from dropbox for Windows x64 is no longer available. It can be found as an attachment at https://www.mersenneforum.org/showpo...0&postcount=26 NOTE: it is not being maintained, and preferred usage is upload through a current version of prime95 or mprime. Usage is Code: uploader user_id proof_filename[ chunk_size[ upload_rate_limit]] with chunk_size expressed in MB and upload_rate_limit expressed in Mbps apparently. (Note, for gpuowl, there are more choices; https://www.mersenneforum.org/showpo...0&postcount=26, some of which might conceivably apply to prime95/mprime too, at least for the most adventurous. But I encourage users to stick with prime95 & mprime's built in PrimeNet API & supported features whenever practical.) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-09-17 at 14:14 Reason: V30.3b6 general release update
 2020-10-07, 06:50 #10 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 10100011100012 Posts Performing version upgrades The most efficient method will depend on whether it's a single install or a fleet of them to be upgraded. Each leaves your results files, worktodo, log files, work in progress files, and configuration files in place and undisturbed. (But you should be doing regular system backups anyway.) Single install: Stop and exit the prime95 program to allow prime95 program files to be overwritten. Download the zip file. Unzip it. If necessary, move the new files into your working directory. Select replace if prompted. Restart the program in the working directory. Multiple systems, with USB drive: Download the zip file. Put it onto the USB drive. Unzip it there. On each system: Insert the USB stick. Stop and exit the prime95 program to allow prime95 program files to be overwritten. Copy the new version's files from the USB stick to the working directory, overwriting the old. Start the program in the working directory. "Eject" the USB drive. Its file explorer window will close. Remove the USB stick. Multiple systems, with network drive: Download the zip file. Put it onto the network drive. Unzip it there. On each system: In file explorer, navigate to the update version prime95 folder on the network drive. Stop and exit the prime95 program to allow prime95 program files to be overwritten. Copy the new version's files from the network folder to the working directory, overwriting the old. Start the program in the working directory. Close the file explorer window for the update version folder. It's possible to streamline the above somewhat with a bit of batch script. Strictly speaking, it is not necessary to copy and overwrite files that have not changed from the previous version, but it does little harm. Unneeded copying can be efficiently avoided by date sorting both source and destination folders, and only copying what's newer than the corresponding destination file. For more detail, quoted with some editing, from S485122 at https://mersenneforum.org/showpost.p...61&postcount=4 prime.txt contains the GIMPS user data, local.txt contains the machine data, worktodo.txt contains the current work (assigned or not), at some times a file named prime.spl which contains the results not yet transmitted to the server might be present, the work files pnnnnnnn mnnnnnnnn etc and their backup copies .bu, bu2, etc... None of these files are in the prime95.zip archive and will thus not be overwritten. They are essential for continuity. There are other user files that are not in the archive either, but they are less critical (results.txt, results.json.txt, prime.log, gwnum.txt, ...) In other words, keep all other files in the folder, since they contain your user and machine data and preferences, your work in progress and results. The only files overwritten will be the program and version dependent files. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 2020-11-19 at 22:26
2020-11-15, 19:38   #11
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,233 Posts
Effect of number of workers continued 2

Additional processor types:
FMA3 capable 6-core i7-8750H (no code running on the IGP at the time)
Xeon Phi 7250 (68 cores in one socket) see also https://www.mersenneforum.org/showthread.php?t=25767
Attached Files
 6-core i7-8750H peregrine performance.pdf (64.5 KB, 36 views) 68-core xeon phi 7250 performance.pdf (167.6 KB, 39 views)

Last fiddled with by kriesel on 2021-03-17 at 18:10

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post kriesel kriesel 28 2021-03-27 18:40 kriesel kriesel 5 2020-07-02 01:30 kriesel kriesel 4 2019-11-03 18:02 kriesel kriesel 4 2019-08-12 16:32 kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 05:42.

Mon Jun 14 05:42:21 UTC 2021 up 17 days, 3:29, 0 users, load averages: 0.80, 0.95, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.