mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-05-24, 15:20   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Mersenne Prime mostly-GPU Computing reference material

This thread is intended to be a repository for the most current available:
"what is it and what is it for"
"where to find it"
"how to use it"
"features and / or limits"
"bug lists, wish-list items, workarounds"
"where to find a discussion thread about it"
etc
relevant to searching for Mersenne primes on GPUs.
Some incidental CPU-related or general content may be present, but the focus is on GPU-based Mersenne computing.

Content specific to one gpu application will be grouped in a separate thread about that GPU application.
This thread is for general reference info or mixed info or comparisons between similar purpose applications.

It's intended to be a useful-information-rich zone. Like a reference library.

It's _not_ intended to be a place to chat about them.
There are many other existing threads, and places to create new threads, including elsewhere in this blog subforum, and also PM, for that.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383.) Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.

This thread and other threads in this blog are the result of seeking information about these topics, and not finding it in current and organized clear form, when I began gpu computing in ~March 2017. I pulled together what I could find in the existing threads, and after posting some there, was invited to start assembling it here. It continues to evolve and grow as the applications change, and my understanding changes, and people ask questions and occasionally make contributions here or elsewhere. Much of this was written for my own use and to organize and preserve it before I forget it. Much also was written in interactions with others on threads outside this blog space, and then added here. Hopefully others will find it useful, such as to cut down on the confusion of new participants and the "Eternal September" problem for the experienced users helping them out, and the moderators. https://en.m.wikipedia.org/wiki/Eternal_September

Thanks to all who came before, posted thought-provoking items elsewhere, asked questions or provided feedback, and to Xyzzy who offered the blog. And of course, none of it would exist without the remarkable creators of the software we use.


How to get started in GPU computing for GIMPS

All the following is in the context of personal computing equipment you own or have properly authorized physical access to. (Cloud computing, especially Google Colaboratory, and some cellphones are separate possibilities.)

First, have a GPU to use. A discrete (separate) GPU in some sort of personal computer, usually a PCIe card, recommended.
Check the specifications of your system. Identify what model GPU it is and whether it is AMD, Intel, or NVIDIA. Web search for the specifications of the GPU. Saving a bookmark or copy for later reference is recommended.
(It's possible to use an integrated graphics processor, that's part of the cpu package. But performance tends to be quite low on those. And many IGPs are very limited in what they can run. They also generally use up processor package power budget and so reduce performance of whatever GIMPS software may be running on the cpu; Mlucas, mprime, prime95.)

Second, note that GPU applications are text-only, not GUI. Plan to run them in a Linux terminal session or Windows command prompt box that will remain after the application terminates (especially useful when it terminates due to an error). Output redirection from console to a log file, or use of a good tee program may be useful.

Also, mprime and prime95 have support of the PrimeNet API integrated, making automatic progress reporting, assignment, and result reporting easy. GPU applications do not. There will be no progress reporting. You'll need to obtain assignments and report results manually, use an included separate python script for the applications that include that, or use a separate helper application for client management. See http://www.mersenneforum.org/showpos...92&postcount=3.

Third, choose what you want to run. Unlike prime95, which houses many computation types in one application, GPU software is often one program per computation type: trial factoring (TF), P-1 factoring, Lucas-Lehmer testing (LL) or probably-prime testing (PRP).

The minimum requirements for a GPU will depend on what software you plan to run, and conversely the versions and software compatible will depend on what model the GPU is.

Which software you require will depend on the planned calculation type and whether your GPU is NVIDIA based, AMD based, or other.

If you want to run the computation type that is best suited for your GPU, do TF on GPUs with strong integer performance and weak double precision floating point such as the RTX20xx and GTX16xx. Do P-1, LL, PRP on GPUs with relatively stronger double precision floating point such as most Teslas, Radeon VII, Vega, RX4xx or RX5xx.
Conversely, if you want to run a specific computation type and will be buying a GPU to do so, look for performance in the relevant category. RTX20xx and GTX16xx are good buys for TF. Radeon VII is by far the best buy for P-1, LL, PRP. Their power requirements are substantial. Check your system for compatibility.

Generally you will need to ensure that the GPU, driver version, any library files required, and application software are mutually compatible. Otherwise there will be errors. The particulars vary by application, OS, & GPU model. (Avoid mixing very old and new cards in the same system. They can have mutually incompatible requirements.)

If using NVIDIA and CUDA based applications, see https://en.wikipedia.org/wiki/CUDA#GPUs_supported, and note that the latest CUDA version or driver is not always the best performance for a given GPU, and eventually as versions progress, may not even be compatible with a given gpu. Read. Test.

If you want to do trial factoring:
Do normal GIMPS TF on a gpu not a cpu. It is a waste of cpu time by comparison. GPUs are that good at TF.
If you have an NVIDIA GPU, mfaktc is the application. If the GPU is quite old, you may need a CUDA8 or earlier version of mfaktc. GTX 10xx GPUs require CUDA8; GTX16xx and RTX20xx require CUDA10 driver, libraries, and application software.

If it's an AMD GPU or Intel IGP, you want Mfakto and a working install of OpenCL for the device. Running Mfakto may cost half the cpu GIMPS application performance, but produce more total GhzD/day. Example: i7-7500U, prime95 drops from 9.6 to 4.8 but Mfakto yields 20 for an increase from 9.6 to 24.8 total. Stopping prime95, Mfakto yields 18 alone.

If you want to do P-1 factoring, you can consider CUDAPM1 or Gpuowl.
CUDAPm1 can run on old NVIDIA GPU models that are not capable of running Gpuowl. It is somewhat limited in exponent range and reliability.
Recent versions of Gpuowl can run P-1 on AMD or NVIDIA GPUs with a suitable driver and OpenCL installation.

If you want to run primality tests, there's a choice of LL or PRP (Lucas-Lehmer conclusive test or probably prime test).

Old NVIDIA GPUs may be incompatible with running Gpuowl and so require using CUDALucas. Running that would require compatible GPU, driver, CUDAfft library, CUDArt library, and application version. Old NVIDIA GPUs with compute capability 2.x are unlikely to support Gpuowl, so are CUDALucas candidates. These old cards are slow and inefficient with power use and are candidates for replacement.

Newer NVIDIA GPUs, and most AMD GPUs, can run recent versions of Gpuowl that offer P-1, LL, and PRP. Gpuowl requires a suitable driver and OpenCL installation. Recent Gpuowl versions support both LL and PRP. PRP is recommended for new tests because of the excellent reliability Gerbicz error check. Gpuowl is generally faster than CUDALucas on GPUs that can run both. Proof capable versions of gpuowl nearly eliminate the double-checking work for PRP, making it effectively double the speed of LL, LLDC, and occasional LLTC.

Some IGPs can run very early versions of Gpuowl (v1.9 or earlier in my testing). It's generally not worthwhile, costing much more reduction in cpu performance than the progress it produces on the IGP. That's a drawback of IGPs not Gpuowl. Running on an IGP might be a useful or interesting generic learning exercise before buying a GPU.

If you want to run Gpuowl on a system that mostly or entirely does GIMPS computations, you may want to consider the performance advantages of Linux with the ROCm driver. (That's what it's developed on.) There are reports of over 510 GhzD/day with 5M fft length, Linux & rocm with good Radeon VIIs that can have their memory clocked to 1200Mhz. That is faster than on Windows with memory clock 1120 MHz, 448 GhzD/day at 5M fft length, extrapolated linearly to ~480 GhzD/day at 1200Mhz.

Finally, note that where you install GIMPS applications matters. The folder must have adequate permissions set to allow the program, save files, log files, worktodo file, batch files or shell scripts, etc to be created, read, written, modified, executed. That generally makes a subfolder of Program Files on Windows systems a poor choice. The drive root folder can also be a problem. Install in a user owned folder and set permissions appropriately.

Choose an application and computation type and OS.

Next, see post two of this thread below, and the application-specific threads for more details and application-specific how-to directions following:
CUDALucas
CUDAPm1
Gpuowl on Linux, on Windows
Mfaktc
Mfakto
(more links to come...eventually)


Table of contents for generic reference material (this thread)
  1. Intro (this post)
  2. Available Mersenne Prime hunting software http://www.mersenneforum.org/showpos...91&postcount=2
  3. Available Mersenne prime hunting client management software http://www.mersenneforum.org/showpos...92&postcount=3
  4. Disclaimer http://www.mersenneforum.org/showpos...04&postcount=4
  5. Ancestry of available software http://www.mersenneforum.org/showpos...04&postcount=5
  6. Utilities for GPU Computing, etc. http://www.mersenneforum.org/showpos...74&postcount=6
  7. List of fft lengths http://www.mersenneforum.org/showpos...75&postcount=7
  8. Four primality test programs' performance charted together (clLucas, CUDALucas, gpulucas, and gpuOwL) http://www.mersenneforum.org/showpos...76&postcount=8
  9. Mersenne prime hunt work coordination sites versus type and exponent http://www.mersenneforum.org/showpos...11&postcount=9
  10. Devcon (Automating recovery from Windows TDR events for GPUs) http://www.mersenneforum.org/showpos...3&postcount=10
  11. Table of megadigits; what Mersenne exponents have various order of magnitude number of decimal digits http://www.mersenneforum.org/showpos...4&postcount=11
  12. TF & P-1 optimization/tradeoff with each other and primality testing http://www.mersenneforum.org/showpos...7&postcount=12
  13. Assorted handy links http://www.mersenneforum.org/showpos...4&postcount=13
  14. Found a new prime? Really? What next? http://www.mersenneforum.org/showpos...5&postcount=14
  15. NVIDIA-smi http://www.mersenneforum.org/showpos...4&postcount=15
  16. TF & LL GhzD/day ratings & ratios and SP/DP ratios for certain GPUs https://www.mersenneforum.org/showpo...7&postcount=16
  17. P-1 bounds determination https://www.mersenneforum.org/showpo...4&postcount=17
  18. What limits trial factoring? https://www.mersenneforum.org/showpo...4&postcount=18
  19. Error rates https://www.mersenneforum.org/showpo...3&postcount=19
  20. Costs https://www.mersenneforum.org/showpo...8&postcount=20
  21. Reserving a specific exponent https://www.mersenneforum.org/showpo...4&postcount=21
  22. Worktodo entry formats https://www.mersenneforum.org/showpo...8&postcount=22
  23. GPUto72 and PrimeNet P-1 bounds https://www.mersenneforum.org/showpost.php?p=522257&postcount=23
  24. Moving work in progress https://www.mersenneforum.org/showpo...3&postcount=24
  25. GPU P-1 applicability https://www.mersenneforum.org/showpo...3&postcount=25
  26. Save file (in)compatibility https://www.mersenneforum.org/showpo...8&postcount=26
  27. GPU benchmarks https://www.mersenneforum.org/showpo...4&postcount=27
  28. Result formats https://www.mersenneforum.org/showpo...9&postcount=28
  29. Application vs. operating system availability & compatibility https://www.mersenneforum.org/showpo...3&postcount=29
  30. PrimeNet P-1 bounds https://www.mersenneforum.org/showpo...9&postcount=30
  31. P-1 selftest candidates https://www.mersenneforum.org/showpo...8&postcount=31
  32. GPU serial numbers or other stable unique ids https://www.mersenneforum.org/showpo...8&postcount=32
  33. Result formats accepted by mersenne.* https://www.mersenneforum.org/showpo...9&postcount=33
  34. etc tbd
Discussion thread
To keep the reference material threads as usable as possible, please put discussion posts in http://www.mersenneforum.org/showthread.php?t=23383

Looking for...
Information I'm looking for, and responses, here: http://www.mersenneforum.org/showthread.php?t=23407

System management
System management notes http://www.mersenneforum.org/showthread.php?t=23415

Background:
My attempt at writing a background piece https://www.mersenneforum.org/showpo...31&postcount=1
Ernst Mayer's ODROID article containing useful background information https://magazine.odroid.com/article/...tical-history/

NOTE:
Unlike CPU-centric applications like prime95/mprime, Mlucas, and gmp-ecm, which combine support of multiple processor types, OS versions, and usually of multiple computation types into one software program, the GPU applications tend to be specific to both a single computation type and a single hardware category. Trial factoring requires a different program than P-1 factoring; trial factoring on CUDA (NVIDIA) gpu hardware a different program than trial factoring on OpenCL hardware (AMD GPUs, Intel IGPs). There are more choices than the following links indicate; see the second post's attachment. Further, GPUapplications have varying CUDA or compute-compatibility level requirements or OpenCL version requirements and dialects, hardware requirements, OS compatibility, display driver requirements, etc. The lone exception so far is gpuOwL, which has had at various times, LL, Jacobi check, TF, P-1, PRP, and Gerbicz check, and the current version includes P-1, PRP, and Gerbicz check.

Links to other, application-specific reference threads (alphabetically by application name)
clLucas Lucas-Lehmer primality testing with OpenCL on GPUs http://www.mersenneforum.org/showthread.php?t=23401
CUDALucas Lucas-Lehmer primality testing with CUDA on GPUs http://www.mersenneforum.org/showthread.php?t=23387
CUDAPm1 P-1 factoring with CUDA on GPUs http://www.mersenneforum.org/showthread.php?t=23389
gpuOwL PRP primality testing on OpenCL GPUs, or P-1 factoring or Lucas-Lehmer testing http://www.mersenneforum.org/showthread.php?t=23391
Mfaktc CUDA based trial factoring on GPUs http://www.mersenneforum.org/showthread.php?t=23386
Mfakto OpenCL based trial factoring on GPUs http://www.mersenneforum.org/showthread.php?t=23394
Mlucas LL testing on a variety of CPUs https://www.mersenneforum.org/showthread.php?t=23427
prime95 primality testing or factoring on Intel or AMD cpus https://www.mersenneforum.org/showthread.php?t=23900
older software (predating GIMPS and GPUs) https://www.mersenneforum.org/showthread.php?t=24047

Links to elsewhere:
Error rate plots versus exponent size https://www.mersenneforum.org/showth...=10377&page=10


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-10-03 at 23:33 Reason: add result formats accepted by mersenne.*
kriesel is offline  
Old 2018-05-24, 16:11   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Available Mersenne Prime hunting software

The attachment is a pdf tabulating GIMPS program names, requirements, limits/capabilities, download locations, discussion forum threads etc. versus computing hardware and computation type. It covers both gpu-oriented software and cpu-only software. It is periodically updated as changes come to my attention.
Please send any additions, corrections, suggestions etc by PM to kriesel.

New participants are encouraged to read the new participant reference thread https://mersenneforum.org/showthread.php?t=24588

(Content of the available software tabulation was developed with the help of various posters at http://www.mersenneforum.org/showthread.php?t=22450 and some answers to questions by some of the code authors.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf Mersenne prime hunting software.pdf (130.3 KB, 16 views)

Last fiddled with by kriesel on 2021-10-01 at 03:17 Reason: updated attachment: mmff, Pari/GP
kriesel is offline  
Old 2018-05-24, 16:19   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Available Mersenne prime hunting client management software

While it is not necessary to use such separate client management software, to run single or multiple gpus on GIMPS tasks, some participants may find it useful to do so.

The attachment is a pdf describing available software for automatically obtaining mersenne related work and/or reporting results, etc. While primarily oriented to gpu applications, it includes information on support for cpu oriented applications prime95, mprime, and Mlucas also.

While prime95 and mprime are very well supported by an integral PrimeNet API implementation, there is also reportedly a separate command line monitor for those who want frequent updates on status. See https://www.mersenneforum.org/showthread.php?t=25007

See also https://mersenneforum.org/showthread...144#post572144 which is a new development mostly oriented to Google Colab use, but partly applicable also to running Mlucas and CUDALucas (requires Python).

Please send any additions, corrections, suggestions etc by PM to kriesel


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf available client management software.pdf (34.1 KB, 361 views)

Last fiddled with by kriesel on 2021-02-21 at 15:49 Reason: added link to danc/tdulcet's thread announcing Colab notebooks and Python scripts
kriesel is offline  
Old 2018-05-24, 16:41   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Disclaimer

Disclaimer

The tables and other information I've been putting together (or may add or modify in the future) are not a personal endorsement of everything or anything in it. Don't sue or curse me or GIMPS or its officers if something undesired occurs after loading any of the mentioned software on a system, or information offered in good faith turns out to be incorrect. I'm just sharing information I've managed to gather after spending numerous hours reading through forum threads and by other means. Make your own informed decision what to use or not use. As with all downloaded software, the end user assumes any risk. There are no warranties express or implied that I'm aware of.

Draft criteria for inclusion in Mersenne prime hunting software package table, client management table, etc.:

Benign function: It actually does what the author claims it does and accomplishes the factoring or testing approach (within normal bounds, including bugs and documentation errors' effects) and it does not do things that are objectionable (such as interfere with other programs, destabilize or damage system, make console unresponsive or hang the system, crack passwords, transmit user files, mine bitcoin, send spam, install or contain data extortion virus or other malware, search for data useful for blackmail, join a botnet, participate in DDOS, LAN-scan, etc). It can handle the ranges of exponents of interest currently (GIMPS wavefront). It's preferred it also currently supports ranges of exponents extending at least a year into the future at current rates of progress. That allows time for testing for correct function in advance before the wavefront arrives there.

Speed: fast enough relative to any available alternatives that could consume the same computing resources, that running the program does not constitute a large waste of resources. Faster than existing alternatives on similar hardware at high accuracy and reliability is desired.

Accuracy: nothing's perfect, and lengthy primality test, probable-prime test, or factoring operations are exceptionally unforgiving of error, so high effort to ensure accuracy matters greatly. Software should implement self test to verify itself and the hardware it's running on, and implement various on-the-fly error checking and verification such as round-off error checking, detection of illegal output values, status output including interim 64-bit residues, etc.

Stable: Can run for weeks or longer without attention. Stability issues have known effective workarounds. (An example is the NVIDIA driver restart issue and the combination of checkpointing, DEVCON.exe, and batch file wrappers.)

Robust: Saves interim results periodically and can resume from saved interim results in case of later nonrecoverable error. Includes input parameter checking, error handling.

Validation: Outputs should be available in plain text to make easy validation of correct operation. Documentation of data formats of interim save files for comparison of full length interim residues or other computations in progress to results of other programs and to theoretically correct values is a plus.

Trusted: Open Source is the best bet here and highly recommended. Authors willing to put their own full names on it help create confidence. Aliases or closed source or concealed documentation or encryption of code or encryption of communication can have the opposite effect.

Availability: It is available on a known server for free download. License restrictions are minimal and publicly available before download and installation are completed, for an informed decision at the outset or very early. Ideally the documentation can be downloaded separately from the program or installer.

Work management: Optionally provides for connection with work assignment servers. Optional scripting to distribute work or report results automatically. Provision for queuing up work locally is required, enabling manual operation and operation throughout network access interruptions. Provides for resuming or launching subsequent exponent tasks upon completion of one in progress.

Logging: logging to file of results is required. lLogging of program console activity is desired but not a requirement. For software lacking builtin screen output logging, end user use of tee or redirection from a terminal window or command processor window are suggested.

Platform requirements: runs on hardware and operating system(s) which are numerous, and resource requirements are not exceptionally demanding or exotic.

Support: there's a user community, a forum, source code available online, executable(s) easily available, patches easily available, currently maintained software and documentation, clear accurate and ample documentation, online documentation, a wiki, or a responsive capable involved friendly developer. More is better. All the preceding is best.

Tuning: program should implement sensible defaults, and provide for user modification of tune. Some people enjoy tuning, some want set and forget. New hardware not designed when the software was written may benefit from re-tuning.

User interface: command line options and ini file are the minimum. An interactive keyboard mode is helpful for tuning and debugging. A GUI is a plus.

Other: what am I omitting that is of importance to the current or potential users?


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-15 at 22:25 Reason: typos, updates
kriesel is offline  
Old 2018-05-24, 18:13   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

16A716 Posts
Default Ancestry of available software

Some software was derived from other earlier software (or multiple others). The attached crude diagram shows my understanding of ancestor/descendant relationships, gleaned from sources such as source code comments/credits, and web pages. It's intended to show ancestry of code, not concepts. Code shown without connecting lines is believed to have been developed independently. The thin line between lucdwt and prime95 is intended to represent prime95's adverb "loosely".

As usual, please respond with any additions, corrections, comments.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf parentage.pdf (7.8 KB, 333 views)

Last fiddled with by kriesel on 2019-11-15 at 22:26
kriesel is offline  
Old 2018-05-27, 20:33   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Utilities for GPU computing etc.

Here are some things I've run across that I found useful or interesting, or have seen recommended by others.

Versions stated for each, and links, are current as of March 22 2019 or better.
Subject to change without notice, no warranty express or implied, availability versus OS etc will vary, don't look a gift horse in the mouth, ...

URL's are clickable in Acrobat pdf reader.

See also for gpu monitoring in linux, thread https://www.mersenneforum.org/showthread.php?t=23361

It can be a bit confusing which OpenCL device is which device number or platform number on multiplatform or multigpu system, especially since a cpu and IGP may add both devices and a platform. Numbering changes when one platform is uninstalled or malfunctioning. lsgpu is a simple scan, enumeration and summary program. A modified version with source, doc, url, and Wiindows exe are attached in lsgpu.7z.

Suggestions, additions, corrections invited by PM to kriesel.

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf utilities.pdf (35.6 KB, 364 views)
File Type: 7z lsgpu.7z (229.9 KB, 173 views)

Last fiddled with by kriesel on 2020-10-27 at 01:18 Reason: added lsgpu & .7z file
kriesel is offline  
Old 2018-05-27, 20:46   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default list of fft lengths

A table of 7-smooth numbers that are multiples of 210, from 210 to 226 (1k to 65536k or 64M), is provided in attachment "fft lengths.pdf".

This is the list from which code authors are likely to select lengths for fft code implementation, for primality testing or P-1 computation, of significance for running exponents within the mersenne.org exponent range p<109. (Some software implements many of these, and some implements few or just one.)

It's possible to go higher, but there's not much point to doing so currently, since the primality test computation would be likely to take longer than the hardware lifetime.

Some gpu applications are coded for up to 128M or higher. (At least CUDALucas, CUDAPm1) gpuOwL has been extended to a subset of the 7-smooth numbers with a maximum fft length of up to 192M depending on version number; it's currently up to 120M. Mlucas is coded for up to 256M (and up to 512M for Fermat numbers).
Additional tables covering up to 128M, 256M and 512M are also attached.

Recently up to 13-smooth is being used in gpuowl's fft lengths. Mlucas lists up to 31-smooth in its source code.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf fft lengths.pdf (54.1 KB, 415 views)
File Type: pdf fft lengths above 64Mto 128M.pdf (10.6 KB, 248 views)
File Type: pdf fft lengths above 128M to 256M.pdf (11.3 KB, 248 views)
File Type: pdf fft lengths above 256M to 512M.pdf (13.4 KB, 244 views)

Last fiddled with by kriesel on 2020-10-06 at 15:10 Reason: update
kriesel is offline  
Old 2018-05-27, 20:54   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10110101001112 Posts
Default Four primality test programs' performance charted together

clLucas, CUDALucas, gpulucas, and gpuOwL compared.
Note the speed disparity of the GTX480 CUDA gpu, (at 244W), at 3.6-3.7 times the hardware performance rating of the OpenCL low power card (50W), when interpreting these values. Normalizing to equal speed hardware, gpuOwL seems to perform fastest by a comfortable margin.

A fairer hardware comparison would put clLucas and gpuOwl on an RX480. But I had none available at the time. See also http://www.mersenneforum.org/showpos...&postcount=386 showing the RX480 3.4-3.6 times faster than the RX550 on the same exponents.

Note also, that gpuowl v1.9 is what was measured and compared, and that many performance improvements in gpuowl have been implemented since.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf primality tester performance comparison.pdf (15.1 KB, 364 views)

Last fiddled with by kriesel on 2020-08-27 at 17:30
kriesel is offline  
Old 2018-05-28, 15:52   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,933 Posts
Default Mersenne prime hunt work coordination sites vs type and exponent

Here's another condensation of information I gathered along the way. Please reply in the discussion thread or by PM with any corrections or additions you may have. This version includes a link to the archived version of Will Edgington's mersenne related site, which is the only public location I know of with broad coverage of data for Mersenne number factor data for exponents > 232. Unfortunately the zip files there are truncated to 128KB on download and so not usable.

As always, additions, corrections or suggestions are invited by PM to kriesel


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf work coordination sites.pdf (37.3 KB, 5 views)

Last fiddled with by kriesel on 2021-10-03 at 23:04 Reason: attachment updated for P+1 introduction, ECM limit change, etc
kriesel is offline  
Old 2018-05-28, 16:17   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

16A716 Posts
Default Devcon (Automating recovery from Windows TDR events for GPUs)

Windows has provision for detecting delays in display devices responding to it. Above a certain threshold delay, it concludes the device is hung and restarts it. Even if the GPU involved is only running math code, not a display, and may not have a display physically connected. (You can find such restarts in the system event log. For example, Event ID 4101; something like "Display driver nvlddmkm stopped responding and has successfully recovered." for NVIDIA. AMD GPUs are also affected. "Display driver amdkmdap stopped responding and has successfully recovered." IGPs may also be affected, and at smaller work chunks since they are usually quite slow.) The purpose of such restarts is avoiding hung consoles or blue-screen OS crashes. Keeping individual GPU tasks small enough to complete within the TDR timeout is a known issue in the general GPU software developer community.

On Windows 10, an XFX Radeon VII has the issue and hangs for gpuowl unless a very small fft length is used. Older NVIDIA GPUs with low compute compatibility levels have issues with these display restarts. (For example, GTX480, Quadro 4000, which are Compute Capability 2.0) As GIMPS work advances to larger exponents, computations take longer. Windows detects a long time for the GPU to respond and interprets it as a hung device, and stops and restarts it. Unfortunately, it does this in a way that does not reconnect to existing sessions of GPU-Z or other utilities, or to CUDA applications such as CUDALucas or CUDAPm1 or mfaktc, or the opencl equivalent applications or gpuowl. (Reportedly, this issue does not arise on linux.)

Historically, the applications have been wrapped in batch scripts to restart them, and adding a TdrDelay value in the registry larger than the implicit default has been recommended. (see https://docs.microsoft.com/en-us/win...-registry-keys) However, these are incomplete solutions, that often fail. Recently I've found increasing TdrDelay is not enough. On higher bit levels of large exponents, and old slow GPUs, increasing TdrDdiDelay seems to be needed also. Another approach is to run an older driver than the level at which it showed up (on NVIDIA, below about 300). That may be impractical if there's a newer card also present that requires a newer driver. Sometimes while Windows views the GPU as working properly, running applications such as GPU-Z or a newly started GPU computing application can't access the GPU. A system restart clears that situation up. The restart is disruptive to GPU applications running on other GPUs, prime95, and anything else running on the system, and requires operator intervention to stop and restart it all.

But, there is another approach. In Windows' Device Manager, disable and reenable the errant display device to avoid a system restart. I've seen this reenable access to sensor readings of a GPU in a preexisting GPU-Z session, as well as make the GPU available again for use by a newly launched CUDA application. The system restart is avoided, allowing prime95 and other GPUs' application instances to continue uninterrupted and undisturbed.

This doesn't always work; sometimes the GPU can not be reenabled. It may be that the GPU is overheated, or the power supply is at its limit, etc.

The device disable/reenable can be done from the command line or a batch script, minimizing idle time and operator intervention, using the appropriate version of devcon.exe (available in Visual Studio, the Windows Driver Kit, etc) per
https://superuser.com/questions/4290...a-command-line
https://docs.microsoft.com/en-us/win...devtest/devcon
for the version of Windows installed.

Such a script may benefit if it includes delays between commands. Some Windows OS versions don't support the timeout command. Delays can be provided on versions where timeout generates an error, by conditional ping to a nonexistent address (preferably in your own LAN address space, for stability), padding the number of seconds wait with 3 zeros since ping timing is in milliseconds.
Code:
set delay=3 
set nonexist=192.168.2.3 
timeout /t %delay%
if errorlevel 1 ping %nonexist% -n 1 -w %delay%000
Another piece of the puzzle is the device id for devcon.exe in the correct form.

Another piece of the puzzle is getting the batch file containing the devcon command to run as administrator. Otherwise devcon will run at too low a privilege level and list devices but not control them, even when the batch file is launched by an account with administrator permissions. (Most online how-tos for it omit that crucial little detail.) For getting devcon to work, see http://classicshell.net/forum/viewtopic.php?f=5&t=423 particularly the requirement to create a shortcut to force the batch file to run as administrator, and https://docs.microsoft.com/en-us/win...local_computer to find the names of your gpu device(s).

So, putting the pieces together:
a) make the batch file that runs the CUDA app (CUDALucas, CUDAPm1) or OpenCl app that is affected by driver timeouts. In my experience mfaktc and mfakto seems less often affected. Mfaktx seems affected in the higher bit levels of factoring where run times are quite long per class.
b) make a shortcut to the batch file, and in the advanced tab of the shortcut properties, set it to run as administrator
c) modify the shortcut to cmd /k batchfile so it sticks around after it exits and the flow and any error messages can be examined
d) install devcon.exe on the system, either in the working directory of the CUDA app or in \windows\system32, or somewhere else that's in your path.
e) use devcon.exe interactively to obtain the unique device ID for the GPU to be controlled
f) modify the batch file to use the unique device ID obtained, in the disable and enable lines, and adjust other settings as needed. Be sure to use enough of the id that it identifies a unique GPU device matching the CUDA device number affected.
g) secure your system so that running a batch file at elevated privilege is an acceptable risk,
h) make adjustments to TDR related registry settings as needed; increasing TDRDelay helps some; increasing TDRDdiDelay may also help. See https://docs.microsoft.com/en-us/win...-registry-keys for the list and defaults
i) test
j) use and enjoy

Draft batch file, to be run from the high-privilege shortcut:
Code:
set delay=1 
set maxdelay=10 
set count=0 
set countmax=5 
set exe=cudaPm1_win64_20130923_CUDA_55.exe 
 set model=GeForce GTX 480 

set dev=1 
 set nonexist=192.168.2.3 
cd "\Users\ken\My Documents\cudapm1" 
echo worktodo.txt >>cudapm1.txt 
goto loop 

: change the above set commands etc & quoted device identifiers below, to suit your situation and preferences 
: following is what does the production work, from the worktodo file, putting results in the results file and appending history in the cudapm1.txt log file 
: limited looping may be useful in some cases (such as Windows TDR events); too high a count or no limit mostly pointlessly inflates log size, especially if worktodo is emptied
:loop 
echo batch wrapper reports (re)launch of %exe% on %model% at %date% %time% reset count %count% of max %countmax% >>cudapm1.txt 
title %computername% model %model% %exe% dev %dev% reset count %count% (%0) 
%exe% -d %dev% >>cudapm1.txt 
echo batch wrapper reports exit at %date% %time% >>cudapm1.txt echo attempting disable/enable cycle on gpu device >>cudapm1.txt 
devcon disable "PCI\VEN_10DE&DEV_06C0&SUBSYS_14803842" >>cudapm1.txt 
devcon enable "PCI\VEN_10DE&DEV_06C0&SUBSYS_14803842" >>cudapm1.txt 
timeout /T %delay% if errorlevel 1 ping %nonexist% -n 1 -w %delay%000 
if %delay% lss %maxdelay% set /A delay=delay*2 
if %delay% gtr %maxdelay% set delay=%maxdelay% 
set /A count=count+1 
if %count% lss %countmax% goto loop 

echo at %date% %time% countmax=%countmax% reached, exiting batch file
Possibly at some point the equivalent could be built into the applications' code. For now, there is this batch file workaround.

All the above relates to consumer grade NVIDIA GPUs and the WDDM. The Tesla family can use a different driver mode, Tesla Compute Cluster (TCC) mode, for nondisplay compute-only gpus, that does not have the TDR issue. That different driver mode and model (WDM) are not applicable or available for the consumer grade GPUs such as the GeForce models. (For more information, see the "Use a Suitable Driver Model" section in a recent version of "NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS")


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-07-12 at 18:29 Reason: removed "old" qualifier
kriesel is offline  
Old 2018-06-01, 20:08   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×1,933 Posts
Default Table of Megadigits

What's the exponent required for 10, 100 or 1000 megadigit Mersenne numbers? The nearest prime exponents and the numbers of digits of their corresponding Mersenne numbers?
How were those calculated? See the attachment for a handy list. It's been checked against
http://oeis.org/A034887 which gave the formula floor(n*log(2)/log(10)) + 1

Also included is a rough ballpark estimate of what's feasible on a GTX1070 in CUDALucas 2.06.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf megadigits.pdf (29.7 KB, 295 views)

Last fiddled with by kriesel on 2019-11-15 at 22:28
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
"The Librarians" on TNT ... Mersenne Prime reference Madpoo Lounge 6 2017-01-31 20:03
GPU Computing Cheat Sheet (a.k.a. GPU Computing Guide) Brain GPU Computing 20 2015-10-25 18:39
How do you obtain material of which your disapproval governs? jasong jasong 97 2015-09-14 00:17
NFS reference Jushi Math 2 2006-08-28 12:07
The difference between P2P and distributed computing and grid computing GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 20:25.


Sat Oct 23 20:25:56 UTC 2021 up 92 days, 14:54, 0 users, load averages: 0.87, 1.09, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.