View Single Post
Old 2018-05-24, 15:20   #1
kriesel's Avatar
Mar 2017
US midwest

150118 Posts
Default Mersenne Prime mostly-GPU Computing reference material

This thread is intended to be a repository for the most current available:
"what is it and what is it for"
"where to find it"
"how to use it"
"features and / or limits"
"bug lists, wish-list items, workarounds"
"where to find a discussion thread about it"
relevant to searching for Mersenne primes on GPUs.
Some incidental CPU-related or general content may be present, but the focus is on GPU-based Mersenne computing.

Content specific to one gpu application will be grouped in a separate thread about that GPU application.
This thread is for general reference info or mixed info or comparisons between similar purpose applications.

It's intended to be a useful-information-rich zone. Like a reference library.

It's _not_ intended to be a place to chat about them.
There are many other existing threads, and places to create new threads, including elsewhere in this blog subforum, and also PM, for that.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread Off-topic posts may be moved or removed, to keep the reference threads clean, tidy, and useful.

This thread and other threads in this blog are the result of seeking information about these topics, and not finding it in current and organized clear form, when I began gpu computing in ~March 2017. I pulled together what I could find in the existing threads, and after posting some there, was invited to start assembling it here. It continues to evolve and grow as the applications change, and my understanding changes, and people ask questions and occasionally make contributions here or elsewhere. Much of this was written for my own use and to organize and preserve it before I forget it. Much also was written in interactions with others on threads outside this blog space, and then added here. Hopefully others will find it useful, such as to cut down on the confusion of new participants and the "Eternal September" problem for the experienced users helping them out, and the moderators.

Thanks to all who came before, posted thought-provoking items elsewhere, asked questions or provided feedback, and to Xyzzy who offered the blog. And of course, none of it would exist without the remarkable creators of the software we use.

How to get started in GPU computing for GIMPS

All the following is in the context of personal computing equipment you own or have properly authorized physical access to. (Cloud computing, especially Google Colaboratory, and some cellphones are separate possibilities.)

First, have a GPU to use. A discrete (separate) GPU in some sort of personal computer, usually a PCIe card, recommended.
Check the specifications of your system. Identify what model GPU it is and whether it is AMD, Intel, or NVIDIA. Web search for the specifications of the GPU. Saving a bookmark or copy for later reference is recommended.
(It's possible to use an integrated graphics processor, that's part of the cpu package. But performance tends to be quite low on those. And many IGPs are very limited in what they can run. They also generally use up processor package power budget and so reduce performance of whatever GIMPS software may be running on the cpu; Mlucas, mprime, prime95.)

Second, note that GPU applications are text-only, not GUI. Plan to run them in a Linux terminal session or Windows command prompt box that will remain after the application terminates (especially useful when it terminates due to an error). Output redirection from console to a log file, or use of a good tee program may be useful.

Also, mprime and prime95 have support of the PrimeNet API integrated, making automatic progress reporting, assignment, and result reporting easy. GPU applications do not. There will be no progress reporting. You'll need to obtain assignments and report results manually, use an included separate python script for the applications that include that, or use a separate helper application for client management. See

Third, choose what you want to run. Unlike prime95, which houses many computation types in one application, GPU software is often one program per computation type: trial factoring (TF), P-1 factoring, Lucas-Lehmer testing (LL) or probably-prime testing (PRP).

The minimum requirements for a GPU will depend on what software you plan to run, and conversely the versions and software compatible will depend on what model the GPU is.

Which software you require will depend on the planned calculation type and whether your GPU is NVIDIA based, AMD based, or other.

If you want to run the computation type that is best suited for your GPU, do TF on GPUs with strong integer performance and weak double precision floating point such as the RTX20xx and GTX16xx. Do P-1, LL, PRP on GPUs with relatively stronger double precision floating point such as most Teslas, Radeon VII, Vega, RX4xx or RX5xx.
Conversely, if you want to run a specific computation type and will be buying a GPU to do so, look for performance in the relevant category. RTX20xx and GTX16xx are good buys for TF. Radeon VII is by far the best buy for P-1, LL, PRP. Their power requirements are substantial. Check your system for compatibility.

Generally you will need to ensure that the GPU, driver version, any library files required, and application software are mutually compatible. Otherwise there will be errors. The particulars vary by application, OS, & GPU model. (Avoid mixing very old and new cards in the same system. They can have mutually incompatible requirements.)

If using NVIDIA and CUDA based applications, see, and note that the latest CUDA version or driver is not always the best performance for a given GPU, and eventually as versions progress, may not even be compatible with a given gpu. Read. Test.

If you want to do trial factoring:
Do normal GIMPS TF on a gpu not a cpu. It is a waste of cpu time by comparison. GPUs are that good at TF.
If you have an NVIDIA GPU, mfaktc is the application. If the GPU is quite old, you may need a CUDA8 or earlier version of mfaktc. GTX 10xx GPUs require CUDA8; GTX16xx and RTX20xx require CUDA10 driver, libraries, and application software.

If it's an AMD GPU or Intel IGP, you want Mfakto and a working install of OpenCL for the device. Running Mfakto may cost half the cpu GIMPS application performance, but produce more total GhzD/day. Example: i7-7500U, prime95 drops from 9.6 to 4.8 but Mfakto yields 20 for an increase from 9.6 to 24.8 total. Stopping prime95, Mfakto yields 18 alone.

If you want to do P-1 factoring, you can consider CUDAPM1 or Gpuowl.
CUDAPm1 can run on old NVIDIA GPU models that are not capable of running Gpuowl. It is somewhat limited in exponent range and reliability.
Recent versions of Gpuowl can run P-1 on AMD or NVIDIA GPUs with a suitable driver and OpenCL installation.

If you want to run primality tests, there's a choice of LL or PRP (Lucas-Lehmer conclusive test or probably prime test).

Old NVIDIA GPUs may be incompatible with running Gpuowl and so require using CUDALucas. Running that would require compatible GPU, driver, CUDAfft library, CUDArt library, and application version. Old NVIDIA GPUs with compute capability 2.x are unlikely to support Gpuowl, so are CUDALucas candidates. These old cards are slow and inefficient with power use and are candidates for replacement.

Newer NVIDIA GPUs, and most AMD GPUs, can run recent versions of Gpuowl that offer P-1, LL, and PRP. Gpuowl requires a suitable driver and OpenCL installation. Recent Gpuowl versions support both LL and PRP. PRP is recommended for new tests because of the excellent reliability Gerbicz error check. Gpuowl is generally faster than CUDALucas on GPUs that can run both. Proof capable versions of gpuowl nearly eliminate the double-checking work for PRP, making it effectively double the speed of LL, LLDC, and occasional LLTC.

Some IGPs can run very early versions of Gpuowl (v1.9 or earlier in my testing). It's generally not worthwhile, costing much more reduction in cpu performance than the progress it produces on the IGP. That's a drawback of IGPs not Gpuowl. Running on an IGP might be a useful or interesting generic learning exercise before buying a GPU.

If you want to run Gpuowl on a system that mostly or entirely does GIMPS computations, you may want to consider the performance advantages of Linux with the ROCm driver. (That's what it's developed on.) There are reports of over 510 GhzD/day with 5M fft length, Linux & rocm with good Radeon VIIs that can have their memory clocked to 1200Mhz. That is faster than on Windows with memory clock 1120 MHz, 448 GhzD/day at 5M fft length, extrapolated linearly to ~480 GhzD/day at 1200Mhz.

Finally, note that where you install GIMPS applications matters. The folder must have adequate permissions set to allow the program, save files, log files, worktodo file, batch files or shell scripts, etc to be created, read, written, modified, executed. That generally makes a subfolder of Program Files on Windows systems a poor choice. The drive root folder can also be a problem. Install in a user owned folder and set permissions appropriately.

Choose an application and computation type and OS.

Next, see post two of this thread below, and the application-specific threads for more details and application-specific how-to directions following:
Gpuowl on Linux, on Windows
(more links to come...eventually)

Table of contents for generic reference material (this thread)
  1. Intro (this post)
  2. Available Mersenne Prime hunting software
  3. Available Mersenne prime hunting client management software
  4. Disclaimer
  5. Ancestry of available software
  6. Utilities for GPU Computing, etc.
  7. List of fft lengths
  8. Four primality test programs' performance charted together (clLucas, CUDALucas, gpulucas, and gpuOwL)
  9. Mersenne prime hunt work coordination sites versus type and exponent
  10. Devcon (Automating recovery from Windows TDR events for GPUs)
  11. Table of megadigits; what Mersenne exponents have various order of magnitude number of decimal digits
  12. TF & P-1 optimization/tradeoff with each other and primality testing
  13. Assorted handy links
  14. Found a new prime? Really? What next?
  15. NVIDIA-smi
  16. TF & LL GhzD/day ratings & ratios and SP/DP ratios for certain GPUs
  17. P-1 bounds determination
  18. What limits trial factoring?
  19. Error rates
  20. Costs
  21. Reserving a specific exponent
  22. Worktodo entry formats
  23. GPUto72 and PrimeNet P-1 bounds
  24. Moving work in progress
  25. GPU P-1 applicability
  26. Save file (in)compatibility
  27. GPU benchmarks
  28. Result formats
  29. Application vs. operating system availability & compatibility
  30. PrimeNet P-1 bounds
  31. P-1 selftest candidates
  32. GPU serial numbers or other stable unique ids
  33. Result formats accepted by mersenne.*
  34. Optimal prp proof power versus exponent
  35. Requirements for comparability of interim residues
  36. Exponent limits
  37. Gerbicz Error Check block size
  38. etc tbd
Discussion thread
To keep the reference material threads as usable as possible, please put discussion posts in

Looking for...
Information I'm looking for, and responses, here:

System management
System management notes

My attempt at writing a background piece
Ernst Mayer's ODROID article containing useful background information

Unlike CPU-centric applications like prime95/mprime, Mlucas, and gmp-ecm, which combine support of multiple processor types, OS versions, and usually of multiple computation types into one software program, the GPU applications tend to be specific to both a single computation type and a single hardware category. Trial factoring requires a different program than P-1 factoring; trial factoring on CUDA (NVIDIA) gpu hardware a different program than trial factoring on OpenCL hardware (AMD GPUs, Intel IGPs). There are more choices than the following links indicate; see the second post's attachment. Further, GPUapplications have varying CUDA or compute-compatibility level requirements or OpenCL version requirements and dialects, hardware requirements, OS compatibility, display driver requirements, etc. The lone exception so far is gpuOwL, which has had at various times, LL, Jacobi check, TF, P-1, PRP, and Gerbicz check, and the current version includes P-1, PRP, and Gerbicz check.

Links to other, application-specific reference threads (alphabetically by application name)
clLucas Lucas-Lehmer primality testing with OpenCL on GPUs
CUDALucas Lucas-Lehmer primality testing with CUDA on GPUs
CUDAPm1 P-1 factoring with CUDA on GPUs
gpuOwL PRP primality testing on OpenCL GPUs, or P-1 factoring or Lucas-Lehmer testing
Mfaktc CUDA based trial factoring on GPUs
Mfakto OpenCL based trial factoring on GPUs
Mlucas LL testing on a variety of CPUs
prime95 primality testing or factoring on Intel or AMD cpus
older software (predating GIMPS and GPUs)

Links to elsewhere:
Error rate plots versus exponent size

Top of reference tree:

Last fiddled with by kriesel on 2022-06-29 at 17:30 Reason: add Gerbicz error check block size
kriesel is online now