View Single Post
Old 2020-01-16, 14:44   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·13·283 Posts
Default Best practices

In general, what would constitute best practices for GIMPS effort? My proposal:
  1. Use the most efficient algorithm and software for the task and hardware (example: gpuowl not cllucas for AMD GPU primality testing; gpuowl not CUDALucas on most NVIDIA GPUs; v6.11-380 is quite efficient and versatile); first primality tests with PRP with GEC & proof generation, not PRP without either, much less LL. DON'T RUN LL FIRST TESTS IF YOU CAN AVOID IT! (In my testing, the productivity of gpuowl PRP/GEC with proof is over triple that of CUDALucas LL with consequent need of LLDC & occasional LLTC etc. on the identical hardware!)
  2. Select the most efficient hardware for the task (examples: use an RTX20xx for TF, use a good CPU or Radeon VII for PRP or P-1; use GPUs with relatively greater single precision performance compared to double precision for TF, those with relatively greater double precision performance compared to single precision for PRP, LL or P-1 factoring. Don't use CPUs for production TF, since GPUs are so much more effective at it.) Most recent NVIDIA consumer GPUs (RTX20xx, GTX16xx, RTX30xx; those with high SP/DP performance ratios, 16:1, 32:1, 64:1) are more effective for TF than for other GIMPS computation types; most NVIDIA Tesla GPUs and AMD GPUs with low SP/DP ratios (16:1 or lower) are suitable for PRP, P-1, or LLDC. Radeon VII is 4:1, some Teslas are 2:1. CPUs are typically DP:SP 0.5 to 2, and should not be used for TF if it can be avoided.
  3. Use a very recent version of the chosen software. Temper that with what is most efficient and reliable. With the latest version sometimes come the latest bugs. Sometimes there are speed regressions. Benchmark.
  4. Use the most effective settings for the given software (examples: PRP with GEC and proof generation whenever practical as a first test, not PRP without proof generation, definitely not LL first tests if PRP/GEC/proof can be done; optimal throughput by benchmarking prime95/mprime for throughput versus number of cores/worker versus various fft lengths, analyze, and reconfigure when appropriate). PRP with proof provides about double the effective throughput for the project vs. PRP without proof .
  5. Use judiciously chosen inputs, for reasonable run time and feasibility of completing the task accurately. A run that takes years is not only likely to expire before completion, it is unlikely to complete accurately unless it is protected by the GEC.
  6. Always log the runs. Some applications have logging built in. Others will need tee or redirection. Run command-line GIMPS applications in terminal sessions that will remain after program termination, so that you have time to read error messages. Some programs employ both stdout and stderr, and redirection for both may be useful.
  7. Tune the application for the specific software version and hardware involved and exponents & other parameters being run. (For example, mfaktc's optimal tune may differ for the same hardware for differing exponent or bit level.)
  8. Run at least one double-check, and a memory test, per piece of hardware (CPU or GPU), to test the reliability of the hardware & software combination, before beginning production running. Only run GIMPS work on systems confirmed to be reliable.
  9. Regularly review the logs for errors. Either manually or with an analysis tool.
  10. Repeat double-check or self-test and memory test at least annually. Hardware reliability changes over time. Generally for the worse. Remedy unreliable hardware, by repair, or by removing it from GIMPS service.
  11. Re-tune if substantially changing the exponents being run, or when a new version of the software is deployed.
  12. Reserve assignments first. Then expeditiously finish and report them. Don't poach the assignments of others. Don't reserve more than you complete in a reasonable amount of time, ~ a month, less if their expiration is shorter. Don't hoard assignments.
  13. Select work types and assignments appropriately to the capabilities of the hardware and software, so that assignments complete in a reasonable amount of time. (In most cases that will be under two months.)
  14. Contribute at least about 1/5 of your primality testing effort as double-checking, while a double-checking backlog remains.
  15. Prioritize advancing the GIMPS wavefronts of TF, P-1, first primality testing, and verification. This is the most effective at advancing the state of knowledge about Mersenne primes.
  16. When P-1 factoring, use the full GPU72 bounds given for the exponent at mersenne.ca when possible. (That is the most effective strategy while exponents are being both primality tested and verified, as they currently are. And improvements in P-1 factoring performance in prime95 v30.7 & v30.8 and gpuowl v7.x make higher bounds than before productive.)
  17. Be vigilant and relentless about controlling error rate. Use PRP with GEC whenever it fits the work and the hardware. Use CUDALucas for LL only for DC on old GPUs that can't run a current version of Gpuowl with PRP or LLDC. Take corrective measures as required to force mean GEC indicated error rate below 1 error detected and corrected/GPU/month or CPU worker/month. (what's the right error rate threshold here?)
  18. Become familiar with the normal behavior of the GIMPS GPU applications you run. It makes it easier to spot when something is not right.
  19. If making manual additions to worktodo files, err on the side of giving complete information to the program.
  20. For project efficiency, use PRP with GEC and proof generation whenever practical, not mere PRP or LL. This greatly reduces verification effort, by ~99+% yet provides greater confidence in the result than matching res64 values from double checks.
  21. UPLOAD your proof files! Preferably promptly.
  22. Have fun!
  23. What else?
see also https://www.mersenneforum.org/showth...225#post535225


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-12-18 at 21:27 Reason: added to #1 & #6 & #7 & #16, & minor edits elsewhere
kriesel is offline