mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-12-14, 19:51   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default prime95-specific reference material

This thread is here for comparison to the gpu-based applications. Please use the reference discussion thread https://www.mersenneforum.org/showthread.php?t=23383 to make comments or suggestions.

Mprime and prime95 are Intel-compatible-processor-specific. Older processor models will have limited if any support. For ARM and other not-Intel-compatible cpus, see mlucas.

There is a version of v29.8 (build 6), and for MacOS build 7, available at https://www.mersenne.org/download/. See also PRP-proof capable versions v30.x, which greatly reduce the effort of verification of a primality test.

Setup instructions are included at https://www.mersenne.org/download/. Follow with https://www.mersenne.org/gettingstarted/
I strongly recommend benchmarking over the range of fft lengths expected to be used, analyzing the results in a spreadsheet, and configuring for best throughput that is consistent with latencies shorter than applicable expiration periods. Configure worker windows for your preferred work type, and make sure that trial factoring is not it; gpus are far more effective at that.

For remaining questions see the program's extensive included documentation.

PRP run time scaling for low p https://www.mersenneforum.org/showpo...78&postcount=2
P-1 run time scaling https://www.mersenneforum.org/showpo...92&postcount=3
Effect of number of workers https://www.mersenneforum.org/showpo...18&postcount=4
Effect of number of workers (continued) https://www.mersenneforum.org/showpo...19&postcount=5
Effect of frequent interim residue output https://www.mersenneforum.org/showpo...44&postcount=6
Prime95 documentation https://www.mersenneforum.org/showpo...03&postcount=7
Prime95 exponent limits https://www.mersenneforum.org/showpo...74&postcount=8
PRP proof capable versions https://www.mersenneforum.org/showpo...35&postcount=9
Performing version upgrades https://www.mersenneforum.org/showpo...2&postcount=10
Effect of number of workers continued 2 https://www.mersenneforum.org/showpo...4&postcount=11

See also the Concepts in GIMPS Trial Factoring post at https://www.mersenneforum.org/showpo...23&postcount=6


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-11-19 at 21:25 Reason: added effect of number of workers continued 2
kriesel is offline  
Old 2018-12-14, 19:54   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default PRP run time scaling for low p

Run time is fitted as approximately proportional to p2.094, for 86243 <= p <= 2976221. LL run time is expected to scale very similarly. For comparison a theoretical fft convolution based primality tester scales as p2 log p log log p, which over the mersenne.org interval fits as p2.117. Overhead at low exponents lowers the power on a fit.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf prp run times low Mp.pdf (15.9 KB, 123 views)

Last fiddled with by kriesel on 2019-11-18 at 14:30
kriesel is offline  
Old 2018-12-24, 22:06   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

126516 Posts
Default Prime95 P-1 run time scaling

A small number of widely spaced exponents were run to observe the run time scaling.

For prime95 v29.4b8 x64 run on a Windows 7 x64 system with dual e5-2670 chips, 4 cores (half a chip package) per worker, 32,000 MB allowance per worker, run time was approximately proportional to exponent p2.33 up to 595M (27 days), a somewhat higher power than observed for P-1 on gpus (~2.1).

Another prime95 v29.4b8 x64 run on an FMA equipped i7-7500U Windows 10 X64 system seemed to be taking inordinately long to perform P-1, at p=101M, on 7,200 MB memory allowed, one core. It had been running for two weeks to perform stage 1 and reach 90% in stage 2. It appeared to be paging to disk excessively. The same system can complete an 83M primality test per core in about 2.5 weeks. It was allowed to complete that P-1 and then reset to 4096M memory allowed, after it was found to still page excessively at 6144M. This is a system with 8GB ram currently. In all cases it was running 1 core per worker; the other worker was running an 83M LL. It projected P-1 run times ranging from 4.4 days for 201M to 43 days for 605M, 67 days for 701M. However, attempting 605M resulted in "Cannot initialize FFT code, errcode=1002".
The fit to observed run time is p2.087 (with five data points).

Another run, a mix of prime95 V29.7b1, v29.8b3, and v29.8b6, on an FMA equipped i7-8750H Windows 10 X64 system was able to run 801M (at 8GB allocated of its 16GB installed ram, 37 days run time), and 901M (at 12GB allocated, 57 days run time) and is expected to be capable of up to 920.8M. The offset in the estimated days runtime is believed to be due to whether mfakto is running on the Intel igp or not. It seems to be using somewhat lower bounds than GPU72 figures for exponents above p~400M.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf p-1 run time scaling e5-2670.pdf (14.1 KB, 83 views)
File Type: pdf p-1 run time scaling FMA i7-7500u.pdf (16.1 KB, 88 views)
File Type: pdf p-1 run time scaling FMA i7-8750H.pdf (18.4 KB, 82 views)

Last fiddled with by kriesel on 2020-01-05 at 14:05 Reason: updated i7-8750h attachment for new data
kriesel is offline  
Old 2018-12-28, 18:13   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

126516 Posts
Default Effect of number of workers

Similar to the number of threads choices in gpu applications, on multicore systems, the effect of number of cores per worker in prime95 is unpredictable, and so there is provision for benchmarking.

Number of workers could be chosen to optimize performance. But which measure of performance? Aggregate throughput maximized, latency of one assignment minimized, number of joules used for a 100GhzD primality test, aggregate throughput given a constraint of latency low enough to avoid assignment expiration, something else? For which single fft length, or for the current and next several?

For minimum latency, as for confirming a newly discovered Mersenne prime, Madpoo has run experiments on a dual-14-core system. He reported the fastest primality test time around 20 cores out of the 28 available; any more than 6 on the lesser use package, and the increased package to package data transfers slow the progress.

For picking number of cores/worker per cpu type, that's a reasonable compromise for maximum aggregate throughput, so I can set it and forget it for months or years on each system, I ran the built in prime95 benchmarking over wide fft ranges for a variety of cores/worker, on a variety of cpu types. Then the timings were tabulated in spreadsheets and graphed.

If going after the maximum performance per fft length, consider that some work types restart from the beginning when the number of workers is changed. Read the readme.txt and other files, back up before changing number of workers, plan ahead, etc.

Some patterns emerge. Worker counts that would straddle the divide between processor packages if divided evenly typically do not provide as much throughput. A 12-core 2-package system with 3 workers with equal cores/worker would have at least one worker with cores in each package (4 2 + 2 4). George indicates recent versions of prime95 prevent the straddle by assigning unequal numbers of cores to the workers.
For larger core counts there can be quite a few choices to evaluate. What's fastest for one fft length may not be for others. A compromise that averages a small percentage penalty is usually available. Plotting the various combinations with trend lines seems a useful visualization method for selecting one configuration to run with for a long time.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf 2-core Core 2 Duo E8200 performance.pdf (27.7 KB, 89 views)
File Type: pdf 2-core i3-M370 performance.pdf (28.3 KB, 94 views)
File Type: pdf dual 4-core e5520 performance.pdf (28.3 KB, 82 views)
File Type: pdf dual 6-core X5650 performance tune.pdf (30.7 KB, 89 views)
File Type: pdf dual 6-core E5645 performance.pdf (31.9 KB, 97 views)

Last fiddled with by kriesel on 2020-11-15 at 19:28
kriesel is offline  
Old 2018-12-28, 18:16   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

126516 Posts
Default Effect of number of workers continued

Working around the 5-attachment limit per post:


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf dual 8-core e5-2670 performance.pdf (29.0 KB, 92 views)
File Type: pdf i7-4790 performance.pdf (37.4 KB, 84 views)
File Type: pdf dual e5-2697 prime95 performance.pdf (125.6 KB, 90 views)
File Type: pdf nuc performance.pdf (89.5 KB, 62 views)
File Type: pdf i5-1035g1 performance.pdf (96.3 KB, 8 views)

Last fiddled with by kriesel on 2020-11-13 at 13:27 Reason: cosmetic cleanup for i5-1035G1
kriesel is offline  
Old 2019-03-14, 03:49   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default Effect of frequent Res64 output

Timing runs on LL DC on the same 51M exponent and old 32-bit hardware with prime95 29.4b7 yield conflicting information on the cost of a Res64 output as a multiple of an ordinary iteration. The res64 cost is estimated as 7/8 to 4 times an iteration. Note that because of numbering skew between prime95 and other conventions, prime95 outputs res64 at 3 successive iterations, with cost ~3.1 to 12 times an iteration. The lower value is based on prime95-provided timings per iteration, the higher value on prime95-provided time stamp of 1 second resolution of the res64 output line.

An initial attempt to make a similar measurement on an i7-8750H with UHD630 igp in prime95 v29.4b8 x64 yielded negative per-res64 cost in two tries. I speculate this was an interaction with mfakto running at the same time on the same chip package power budget. Performance monitor indicates the cpu utilization drops considerably when frequent interim residue output is enabled.

A retest, with the UHD630 mfakto instance halted, yielded timings that indicate a cost per PRP3 res64 interim output on the i7-8750H system of 2.7 seconds, equivalent to 263. iterations, on an 83M primality test. One of the 6 cores stays very busy while the rest are only used at a low duty cycle when outputting an interim residue every 10 iterations. This cut throughput from 96.6 iter/sec to 3.54 iter/sec, a rather severe 96.3% reduction. The estimated effect on run time for the exponent when producing interim residues for the primenet server at 5,000,000 iteration intervals is about 45 seconds, 52ppm of run time. The retest was brief, taking 48 seconds for iterations with interim residues, and 114 seconds without, so accuracy is no better than a percent or two. Note also the cpu clock was not held constant during the test. In this case the agreement between time stamp based rates and program-computed ms/iter was very good, ~1/4%.

Another test, on a dual-xeon-e5-2690 system, v29.6b6 x64 on Win10, 4 cores/worker, 83.9M PRP tests, gave ~305 iterations/interim residue64, 3.45 sec/interim residue, or around 61ppm for the default 5,000,000 iteration interval. The preceding figure ignores the initial 500K-iteration interim residue, which raises the impact a bit to 65ppm for ~84M exponents, and somewhat more for DC exponents.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Thumbnails
Click image for larger version

Name:	frequent res64 hit on peregrine.png
Views:	85
Size:	96.2 KB
ID:	20032  
Attached Files
File Type: pdf res64 timing for prime95.pdf (12.1 KB, 87 views)
File Type: pdf res64 timing for prime95 v29.4b8 i7-8750h.pdf (13.1 KB, 92 views)
File Type: pdf res64 timing for prime95 v29.6b6 e5-2690.pdf (11.8 KB, 89 views)

Last fiddled with by kriesel on 2019-11-18 at 14:31
kriesel is offline  
Old 2019-08-12, 17:22   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default Prime95 documentation

Most GIMPS applications include a readme file. Prime95 has very comprehensive documentation included in the zip package, in multiple files.
  • License.txt for the license terms
  • Readme.txt for the new user and periodic reference
  • Whatsnew.txt particularly useful when upgrading
  • Stress.txt relating to stress testing and reliability testing
  • Undoc.txt documentation of the perhaps less frequently used options
Read them early and often. Like for the other applications, reading the documentation again after additional experience with the program is useful.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-18 at 14:32
kriesel is offline  
Old 2020-05-25, 00:07   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default Prime95 exponent limits

Prime95 and its sibling mprime contain many code paths specific to processor types and exponent magnitudes. What range of exponents is supported varies by processor type. I think what has been implemented was determined by a combination of processor throughput versus exponent size and decisions by George on which to spend his programming time.

There are several ways to determine what these limits are.
George has made statements about them in email or on the forum.
https://mersenneforum.org/showpost.p...&postcount=219

The whatsnew.txt describes numerous changes in what was supported.

The source code is available for examination.

Trying runs on differing hardware and OS may obscure the situation, because it could be that it's an old operating system version, not the processor type, that prevents running some versions of code.

Last fiddled with by kriesel on 2020-05-25 at 00:21
kriesel is offline  
Old 2020-07-27, 16:03   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default PRP proof capable versions

UPDATE:
V30.3b6 is now generally available. This automatically uploads proof files and includes resource limit features. Direct download links for prime95 64-bit for Windows; mprime 64-bit for Linux. (32 bit and other variations also available.)

V30.3b6 appears on the main GIMPS software page and the mersenne.ca download mirror.

Previously:
Per https://www.mersenneforum.org/showpo...&postcount=119 V30.1b1 prime95 or mprime are available and require manual uploading of proof files.
Direct download from dropbox: prime95 for Windows 64-bit; mprime for Linux 64-bit
A run of PRP with proof becomes conspicuous by its multi-gigabyte p.residues file.
These downloads contain all the necessary code includiing dll files.
(V30.2b1 DID NOT contain the dlls. Install v30.1b1 first, then v30.2b1 atop it.)

The standalone command-line uploader, which works for gpuowl as well as prime95, is described briefly at https://www.mersenneforum.org/showpo...&postcount=154
but the direct download from dropbox for Windows x64 is no longer available. It can be found as an attachment at https://www.mersenneforum.org/showpo...0&postcount=26
NOTE: it is not being maintained, and preferred usage is upload through a current version of prime95 or mprime.

Usage is
Code:
uploader user_id proof_filename[ chunk_size[ upload_rate_limit]]
with chunk_size expressed in MB and upload_rate_limit expressed in Mbps apparently.

(Note, for gpuowl, there are more choices; https://www.mersenneforum.org/showpo...0&postcount=26, some of which might conceivably apply to prime95/mprime too, at least for the most adventurous. But I encourage users to stick with prime95 & mprime's built in PrimeNet API & supported features whenever practical.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-17 at 14:14 Reason: V30.3b6 general release update
kriesel is offline  
Old 2020-10-07, 06:50   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default Performing version upgrades

The most efficient method will depend on whether it's a single install or a fleet of them to be upgraded. Each leaves your results files, worktodo, log files, work in progress files, and configuration files in place and undisturbed. (But you should be doing regular system backups anyway.)

Single install:
Stop and exit the prime95 program to allow prime95 program files to be overwritten.
Download the zip file. Unzip it.
If necessary, move the new files into your working directory. Select replace if prompted. Restart the program in the working directory.

Multiple systems, with USB drive:
Download the zip file. Put it onto the USB drive. Unzip it there.
On each system:
Insert the USB stick.
Stop and exit the prime95 program to allow prime95 program files to be overwritten.
Copy the new version's files from the USB stick to the working directory, overwriting the old.
Start the program in the working directory.
"Eject" the USB drive. Its file explorer window will close.
Remove the USB stick.

Multiple systems, with network drive:
Download the zip file. Put it onto the network drive. Unzip it there.
On each system:
In file explorer, navigate to the update version prime95 folder on the network drive.
Stop and exit the prime95 program to allow prime95 program files to be overwritten.
Copy the new version's files from the network folder to the working directory, overwriting the old.
Start the program in the working directory.
Close the file explorer window for the update version folder.

It's possible to streamline the above somewhat with a bit of batch script.

Strictly speaking, it is not necessary to copy and overwrite files that have not changed from the previous version, but it does little harm.
Unneeded copying can be efficiently avoided by date sorting both source and destination folders, and only copying what's newer than the corresponding destination file.

For more detail, quoted with some editing, from S485122 at https://mersenneforum.org/showpost.p...61&postcount=4
prime.txt contains the GIMPS user data,
local.txt contains the machine data,
worktodo.txt contains the current work (assigned or not),
at some times a file named prime.spl which contains the results not yet transmitted to the server might be present,
the work files pnnnnnnn mnnnnnnnn etc and their backup copies .bu, bu2, etc...
None of these files are in the prime95.zip archive and will thus not be overwritten. They are essential for continuity.
There are other user files that are not in the archive either, but they are less critical (results.txt, results.json.txt, prime.log, gwnum.txt, ...)
In other words, keep all other files in the folder, since they contain your user and machine data and preferences, your work in progress and results. The only files overwritten will be the program and version dependent files.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-11-19 at 22:26
kriesel is offline  
Old 2020-11-15, 19:38   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default Effect of number of workers continued 2

Additional processor types:
FMA3 capable 6-core i7-8750H (no code running on the IGP at the time)
Xeon Phi 7250 (68 cores in one socket) see also https://www.mersenneforum.org/showthread.php?t=25767
Attached Files
File Type: pdf peregrine performance.pdf (34.0 KB, 8 views)
File Type: pdf xeon phi 7250 performance.pdf (105.7 KB, 3 views)

Last fiddled with by kriesel on 2020-11-19 at 22:19
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL-specific reference material kriesel kriesel 25 2020-08-10 13:28
Mfakto-specific reference material kriesel kriesel 5 2020-07-02 01:30
gpu-specific reference material kriesel kriesel 4 2019-11-03 18:02
clLucas-specific reference material kriesel kriesel 4 2019-08-12 16:32
CUDAPm1-specific reference material kriesel kriesel 12 2019-08-12 15:51

All times are UTC. The time now is 00:43.

Fri Nov 27 00:43:47 UTC 2020 up 77 days, 21:54, 3 users, load averages: 1.82, 1.53, 1.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.