![]() |
![]() |
#276 |
P90 years forever!
Aug 2002
Yeehaw, FL
11110110100002 Posts |
![]()
30.8 build 7
A few bugs were hopefully fixed. This build concentrates on improving multithreading when working on exponents below say 1M. The gwnum library does not support multithreading for one-pass FFTs and does not scale particularly well for the smallest two-pass FFTs. In the previous build polymult was fully multithreaded, but the poly preparation and poly results processing relied on gwnum library for any multithreading. For my work on exponents around M80000, stage 2 was spending about half the time quad-threaded and half the time single-threaded. Much better now -- about 35% faster. The default is to not use gwnum's multithreading for FFT sizes below 256K. The problem with severely over-allocating memory for small exponents (below say 40K) is a little better, but by no means fixed. I understand the issue, but do not yet understand the best solution. For those that want to P-1 exponents below 100K (and maybe below 1M or 2M), I do the following on my quad core box. Build a big worktodo file using 4 workers, 1 core per worker. Each worktodo line does stage 1 only. Example: Code:
Pminus1=1,2,79039,-1,1000000000,1000000000 Code:
Pminus1=1,2,79039,-1,1000000000,1000000000,117 Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz |
![]() |
![]() |
![]() |
#277 | |
"Seth"
Apr 2019
24·33 Posts |
![]() Quote:
The setup when this was failing was 3 workers, with 8 cores for the last worker. mprime would consistently run with 100% cpu on the 8 cores then all of those cores would drop to 0% without any status lines (or without saving a finished stage 1 file) and mprime would be stuck such that Ctrl+C would just hangs (repeatedly printing "Waiting for worker threads to stop.") The work item was Code:
Pminus1=1,2,38197477,-1,25000000,0,"229184863,1527899081,16730494927,1442642311337,2675886053759,2439808457033351,31168004071948307743,470793337824880328249,2381826327823260512809" Here's some old screen logs (from build 6) Code:
[Worker #3 Dec 30 00:40] M38197477 stage 1 is 99.46% complete. [Worker #2 Dec 30 00:40] M9100919 stage 1 is 47.34% complete. [Worker #1 Dec 30 01:27] M1938317 stage 1 is 38.85% complete. Time: 2792.084 sec. [Worker #1 Dec 30 02:13] M1938317 stage 1 is 39.29% complete. Time: 2788.538 sec. [Worker #2 Dec 30 02:19] M9100919 stage 1 is 48.72% complete. Time: 5911.293 sec. Invalid choice Main Menu 1. Test/Primenet 2. Test/Workers 3. Test/Status 4. Test/Stop 5. Test/Exit 6. Advanced/Test 7. Advanced/Time 8. Advanced/P-1 9. Advanced/ECM 10. Advanced/Manual Communication 11. Advanced/Unreserve Exponent 12. Advanced/Quit Gimps 13. Options/CPU 14. Options/Resource Limits 15. Options/Preferences 16. Options/Torture Test 17. Options/Benchmark 18. Help/About 19. Help/About PrimeNet Server Your choice: 5 [Main thread Dec 30 02:23] Stopping all worker windows. Waiting for worker threads to stop. [Worker #2 Dec 30 02:23] Worker stopped. ^C[Main thread Dec 30 02:23] Stopping all worker windows. ^Z [1]+ Stopped ./mprime -m -d seven@seven:~/Projects/GIMPS/p95v308b6$ kill %1 I repeated this 3 times and the output is all the same (attempt 3 here) giving it plenty of time to complete. This final .5% should take roughly 300 seconds which is < 10 minutes or the 2 hours in the first log. Code:
[Worker #3 Dec 30 02:36] Worker starting [Worker #3 Dec 30 02:36] Setting affinity to run worker on CPU core #6 [Worker #3 Dec 30 02:36] [Worker #3 Dec 30 02:36] P-1 on M38197477 with B1=25000000, B2=2500000000 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 1 on CPU core #7 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 2 on CPU core #8 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 5 on CPU core #11 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 4 on CPU core #10 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 6 on CPU core #12 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 7 on CPU core #13 [Worker #3 Dec 30 02:36] Setting affinity to run helper thread 3 on CPU core #9 [Worker #3 Dec 30 02:36] Using AVX FFT length 2M, Pass1=512, Pass2=4K, clm=4, 8 threads [Worker #3 Dec 30 02:36] M38197477 stage 1 is 99.46% complete. [Worker #2 Dec 30 02:36] M9100919 stage 1 is 48.94% complete. Invalid choice Main Menu 1. Test/Primenet 2. Test/Workers 3. Test/Status 4. Test/Stop 5. Test/Exit 6. Advanced/Test 7. Advanced/Time 8. Advanced/P-1 9. Advanced/ECM 10. Advanced/Manual Communication 11. Advanced/Unreserve Exponent 12. Advanced/Quit Gimps 13. Options/CPU 14. Options/Resource Limits 15. Options/Preferences 16. Options/Torture Test 17. Options/Benchmark 18. Help/About 19. Help/About PrimeNet Server Your choice: 5 [Main thread Dec 30 02:46] Stopping all worker windows. Waiting for worker threads to stop. [Worker #2 Dec 30 02:46] Worker stopped. [Worker #1 Dec 30 02:46] Worker stopped. ^C[Main thread Dec 30 02:46] Stopping all worker windows. ^Z [1]+ Stopped ./mprime -m -d Last fiddled with by SethTro on 2022-01-02 at 00:27 |
|
![]() |
![]() |
![]() |
#278 |
Jun 2003
536510 Posts |
![]()
Build 7 observations:
Extra helper thread affinity works properly. Init is 10% faster (not a super big poly - presumably bigger ones will benefit more). Using smaller FFT size for stage 2 (round offs are, consequently, bigger) More details included in results.json, presumably allowing proper credit calculation. Consequently, Manual submission form chokes on the new JSON format ![]() EDIT:- B2 adjustment now works for fixed B2 as well Last fiddled with by axn on 2022-01-02 at 03:32 |
![]() |
![]() |
![]() |
#279 |
"James Heinrich"
May 2004
ex-Northern Ontario
E8416 Posts |
![]() |
![]() |
![]() |
![]() |
#280 |
Oct 2021
Germany
22·52 Posts |
![]()
One of my workers just spit out this line while Stopping/exiting during stage 2 in build 7:
Code:
5 [Main thread Jan 2 19:22] Stopping all worker windows. Waiting for worker threads to stop. Waiting for worker threads to stop. Waiting for worker threads to stop. Waiting for worker threads to stop. [Main thread Jan 2 19:22] In write_giant, unexpected len == 0 failure [Work thread Jan 2 19:22] Worker stopped. [Main thread Jan 2 19:22] Execution halted. [Main thread Jan 2 19:22] Choose Test/Continue to restart. Another edit: just had a round-off error with this AVX-512 machine. Exponent is a 9.8M one, could explain why this machine is sometimes going so slow today. I backscrolled a bit before the error, to the init, and noticed this absolute madness: Code:
[Worker #2 Jan 2 19:46] Using AVX-512 FFT length 504K, Pass1=1344, Pass2=384, clm=1, 9 threads [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:46] M9801149 stage 1 complete. 0 transforms. Total time: 0.000 sec. [Worker #2 Jan 2 19:46] Round off: 0 [Worker #2 Jan 2 19:46] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 2.196 sec. [Worker #2 Jan 2 19:46] Available memory is 112634MB. [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 1 on CPU core #11 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 5 on CPU core #15 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 2 on CPU core #12 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:46] Switching to AVX-512 FFT length 512K, Pass1=1K, Pass2=512, clm=1, 9 threads [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 4 on CPU core #14 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 3 on CPU core #13 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 7 on CPU core #17 [Worker #2 Jan 2 19:46] Available memory is 112634MB. [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 1 on CPU core #11 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 2 on CPU core #12 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 5 on CPU core #15 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:46] Switching to AVX-512 FFT length 560K, Pass1=896, Pass2=640, clm=1, 9 threads [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 4 on CPU core #14 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 3 on CPU core #13 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 7 on CPU core #17 [Worker #2 Jan 2 19:46] Available memory is 112634MB. [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 1 on CPU core #11 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 2 on CPU core #12 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 5 on CPU core #15 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:46] Switching to AVX-512 FFT length 576K, Pass1=1152, Pass2=512, clm=1, 9 threads [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 4 on CPU core #14 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 3 on CPU core #13 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 7 on CPU core #17 [Worker #2 Jan 2 19:46] Available memory is 112634MB. [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 1 on CPU core #11 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 2 on CPU core #12 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 5 on CPU core #15 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 7 on CPU core #17 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 4 on CPU core #14 [Worker #2 Jan 2 19:46] Switching to AVX-512 FFT length 588K, Pass1=1344, Pass2=448, clm=1, 9 threads [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:46] Setting affinity to run helper thread 3 on CPU core #13 [Worker #2 Jan 2 19:46] Available memory is 112634MB. [Worker #2 Jan 2 19:46] Using 112635MB of memory. D: 43890, 4320x19972 polynomial multiplication. Code:
[Worker #2 Jan 2 19:50] Round off: 0.2715111504 [Worker #1 Jan 2 19:50] M9817169 stage 1 is 0.69% complete. Time: 5.011 sec. [Worker #1 Jan 2 19:50] M9817169 stage 1 is 1.38% complete. Time: 4.945 sec. [Worker #1 Jan 2 19:50] M9817169 stage 1 is 2.07% complete. Time: 4.947 sec. [Worker #2 Jan 2 19:50] Possible roundoff error (0.46648509), backtracking to last save file and using larger FFT. [Worker #1 Jan 2 19:50] M9817169 stage 1 is 2.77% complete. Time: 4.772 sec. [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 1 on CPU core #11 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 5 on CPU core #15 [Worker #2 Jan 2 19:50] Using AVX-512 FFT length 512K, Pass1=1K, Pass2=512, clm=1, 9 threads [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 6 on CPU core #16 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 2 on CPU core #12 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 4 on CPU core #14 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 3 on CPU core #13 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 8 on CPU core #18 [Worker #2 Jan 2 19:50] Setting affinity to run helper thread 7 on CPU core #17 [Worker #1 Jan 2 19:50] Restarting worker with new memory settings. [Worker #2 Jan 2 19:50] M9801149 stage 1 complete. 0 transforms. Total time: 0.000 sec. [Worker #2 Jan 2 19:50] Round off: 0 Last fiddled with by Luminescence on 2022-01-02 at 19:01 |
![]() |
![]() |
![]() |
#281 |
"Oliver"
Sep 2017
Porta Westfalica, DE
983 Posts |
![]()
Situation:
I wanted to continue my work in the 200k range with the now possible higher bounds. Problems:
Observations:
Shortened screen output: Code:
[Work thread Jan 4 11:26] P-1 on M201557 with B1=30000000, B2=TBD [...] [Work thread Jan 4 11:29] Using FMA3 FFT length 10K, Pass1=128, Pass2=80, clm=2, 8 threads [...] [Work thread Jan 4 11:29] Switching to FMA3 FFT length 12K, Pass1=256, Pass2=48, clm=1, 6 threads [Work thread Jan 4 11:29] Using 45875MB of memory. D: 2310, 240x483449 polynomial multiplication. [...] [Work thread Jan 4 11:29] Round off: 0, poly_size: 2, EB: 9.5245, SM: 0 [Work thread Jan 4 11:29] Round off: 0, poly_size: 4 [Work thread Jan 4 11:29] Round off: 0, poly_size: 8 [Work thread Jan 4 11:29] Round off: 0, poly_size: 16 [Work thread Jan 4 11:29] Round off: 0, poly_size: 32 [Work thread Jan 4 11:29] Round off: 0, poly_size: 64 [Work thread Jan 4 11:29] Round off: 0, poly_size: 128 [Work thread Jan 4 11:29] Round off: 0, poly_size: 256 [Work thread Jan 4 11:32] Stage 2 init complete. 6089 transforms. Time: 151.955 sec. [Work thread Jan 4 11:32] Round off: 0 [Work thread Jan 4 11:32] M201557 stage 2 at B2=1301604150 [26.03%] [Work thread Jan 4 11:33] M201557 stage 2 at B2=2417264850 [48.34%]. Time: 57.183 sec. [Work thread Jan 4 11:34] M201557 stage 2 at B2=3532925550 [70.65%]. Time: 57.348 sec. [Work thread Jan 4 11:35] M201557 stage 2 at B2=4648586250 [92.97%]. Time: 57.274 sec. [Work thread Jan 4 11:38] M201557 stage 2 complete. 19328626 transforms. Total time: 360.524 sec. [Work thread Jan 4 11:38] Stage 2 GCD complete. Time: 0.110 sec. [Work thread Jan 4 11:38] M201557 completed P-1, B1=30000000, B2=5764246950, Wi4: 5C1636F6 |
![]() |
![]() |
![]() |
#282 |
"Oliver"
Sep 2017
Porta Westfalica, DE
17278 Posts |
![]()
Another observation, contrary to axn's:
Code:
[Work thread Jan 4 11:38] P-1 on M29833387 with B1=1750000, B2=400000000 [...] [Work thread Jan 4 11:38] Using FMA3 FFT length 1536K, Pass1=1536, Pass2=1K, clm=1, 8 threads [stage 1, ...] [Work thread Jan 4 12:08] Switching to FMA3 FFT length 1600K, Pass1=320, Pass2=5K, clm=2, 8 threads [...] [Work thread Jan 4 12:08] Switching to FMA3 FFT length 1680K, Pass1=448, Pass2=3840, clm=2, 8 threads [...] [Work thread Jan 4 12:08] Switching to FMA3 FFT length 1792K, Pass1=1792, Pass2=1K, clm=1, 8 threads |
![]() |
![]() |
![]() |
#283 |
Jun 2003
536510 Posts |
![]()
I think this is still consistent with my experience. My report of smaller FFT size was relative to the previous version.
For 27.0M exponent, stage 1 uses 1536K and stage 2 uses 1680K with build 6 and earlier. However, with build 7, it switched to a 1600K FFT for stage 2. Looks like it _tries_ to do the same with 29.8M also. It tries to use 1600K FFT, but sees that the round off is bad, increases to 1680K, and finally settles on 1792K. My guess is previous builds would directly jump to 1792K (or higher) FFT. So the general observation is build 7 uses same or smaller FFT for stage 2 compared to previous builds. Last fiddled with by axn on 2022-01-04 at 12:26 |
![]() |
![]() |
![]() |
#284 |
"Vincent"
Apr 2010
Over the rainbow
17×167 Posts |
![]()
on 30.8b6, a 8.5M exponent get a 448k FFT for stage1 and 512k for stage 2.
I have yet to test with b7 |
![]() |
![]() |
![]() |
#285 |
P90 years forever!
Aug 2002
Yeehaw, FL
24·17·29 Posts |
![]()
That is correct. I ought to be able to figure out the best FFT size without all that switching. So much to do, so little time...
|
![]() |
![]() |
![]() |
#286 | |||||
P90 years forever!
Aug 2002
Yeehaw, FL
24×17×29 Posts |
![]() Quote:
Quote:
Quote:
Quote:
Quote:
This bug should not be hard to fix. |
|||||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prime95 beta version 28.4 | Prime95 | Software | 20 | 2014-03-02 02:51 |
Prime95 beta version 28.3 | Prime95 | Software | 68 | 2014-02-23 05:42 |
Prime95 version 27.1 early preview, not-even-close-to-beta release | Prime95 | Software | 126 | 2012-02-09 16:17 |
Beta version 24.12 available | Prime95 | Software | 33 | 2005-06-14 13:19 |
Beta version of PRP | Prime95 | PSearch | 15 | 2004-09-17 19:21 |