![]() |
![]() |
#518 |
"University student"
May 2021
Beijing, China
22·5·13 Posts |
![]()
@George:
Could you allow "Minutes between disk saves" be adjusted to less than 10? I have several colab instances which terminate randomly and the time is limited. However, I see no restrictions on disk writing frequencies. I have always been planning to change the number to 1 or 2, because when the instance is terminated, only 1 or 2 minutes of work is lost, rather than 10. |
![]() |
![]() |
![]() |
#519 | |
P90 years forever!
Aug 2002
Yeehaw, FL
53·149 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#520 |
Random Account
Aug 2009
214810 Posts |
![]()
I ran into problems with v30.8 B12 yesterday evening. I wanted to see how it would handle a small ECM:
Code:
ECM2=N/A,1,2,4363,-1,110000000,11000000000,5 Code:
Faulting application name: prime95.exe, version: 30.8.1.0, time stamp: 0x62421166 Faulting module name: prime95.exe, version: 30.8.1.0, time stamp: 0x62421166 Exception code: 0xc0000005 Fault offset: 0x000000000201b0cc Faulting process id: 0x6ac Faulting application start time: 0x01d84bb4cd2bb5d0 Faulting application path: C:\Prime95\prime95.exe Faulting module path: C:\Prime95\prime95.exe Report Id: 23d3ab23-209a-4c26-98eb-95f4dbf56784 Faulting package full name: Faulting package-relative application ID: Perhaps these two systems are no longer compatible with the ECM code. 29.x runs this without problems. I have a third system which run Windows 7. It's not often that I start it for anything. I put 30.8 on it. It started alright with the ECM I quoted above. I set the RAM at 3 GB. This will be a really long process for that machine. Edit: The run on the i5 (Windows 7) system also failed. A "program not responding" dialog appeared. Last fiddled with by storm5510 on 2022-04-09 at 16:20 |
![]() |
![]() |
![]() |
#521 |
"Oliver"
Sep 2017
Porta Westfalica, DE
3·73 Posts |
![]()
Two observations:
|
![]() |
![]() |
![]() |
#522 | |
P90 years forever!
Aug 2002
Yeehaw, FL
53×149 Posts |
![]() Quote:
I vaguely recall that large pages under Windows also required some registry settings. It's been many years since I wrote that code so I can't help much. I do remember that the procedure was so complex that I gave up on mainstream Windows users ever being able to use the feature. Linux large pages is much easier. However, performance increase, if any, is not measurable. |
|
![]() |
![]() |
![]() |
#523 |
"Oliver"
Sep 2017
Porta Westfalica, DE
3×73 Posts |
![]()
With the new stage 2 for P-1, I think your buffer handling must have changed more than I originally thought. I investigated large pages a bit further. I copied your code in gwutil.c and found that it would correctly allocate large pages on a Windows 10 system without special settings and even without administrator rights.
So I double-checked the memory allocation of stage 2 in Task Manager again, and then it hit me: Half of the memory is allocated in large pages. The other half seems to be allocated differently. So maybe this is something that is fixable? It should also occur on Linux this way, but I have not yet checked this. Why I want to use large pages: Some Windows "function" likes to push out around 4-6 GB of memory into swap in regular intervals, although the freed memory is never used after that (so the swapping-out is not required (at least in this size), but still happening). Having now some buffers on disk, stage 2 needs half an hour to multiple hours to recover. I wanted to avoid this by locking the pages in memory. I also saw that you large pages are normally only aligned to 64 bits since you write their size in those bits. But this is never used. Maybe removing this would increase the gains of using them? |
![]() |
![]() |
![]() |
#524 | |
P90 years forever!
Aug 2002
Yeehaw, FL
53·149 Posts |
![]() Quote:
The bulk of P-1 stage 2 memory allocation is for the two polynomials. These are allocated using gwalloc_array. Gwalloc_array tries to do a large page allocate. If that fails gwalloc_array falls back to a regular malloc. From your description it seems that one polynomial is successfully allocated using large pages, the other fails and uses regular malloc. It seems likely that Windows places a limit on the amount of memory that can be allocated using large pages. This makes sense as large pages are slow/hard/impossible to swap out. By limiting the amount of large page memory Windows will always have an adequate supply of small pages to swap out when needed. This is all speculation on my part. |
|
![]() |
![]() |
![]() |
#525 |
Aug 2002
London, UK
22×23 Posts |
![]()
Why do polymult helper threads sometimes run on all different CPU cores, and sometimes run all on one CPU core?
Setup: v30.8 build 14, Core i9-10900 2.80GHz, 96GB RAM, 80GB allocated to Prime95 (day and night) Worker 1: Running "Pminus1=" workload in 61M range, using 9 cores. Stage 1 uses FMA3 FFT length 3360K. Stage 2 uses FMA3 FFT length 3840K. Worker 2: Running wavefront "Pfactor=" workload in 114M range, using 6 cores. Stage 1 uses FMA3 FFT length 6M. Stage 2 uses FMA3 FFT length 6912K. These two workers are scheduled to run at different times of the day and not overlap: one always stops before the other one starts. local.txt [Worker #1] section contains "CoresPerTest=9" and [Worker #2] section contains "CoresPerTest=6". These were chosen specifically to optimise throughput. Worker 1 always uses separate cores, whereas Worker 2 only ever uses one core. You would think that using multiple cores would always be preferable to using just one core for best throughput. Can someone more knowledgeable of P95's inner workings please advise: is this because of: (a) different work type (Pminus1 vs Pfactor) (b) different FFT sizes (c) some other factor? Sample output from 61M Worker 1: Code:
[Apr 14 17:46:50] Worker starting [Apr 14 17:46:50] Setting affinity to run worker on CPU core #1 [Apr 14 17:46:50] [Apr 14 17:46:50] P-1 on M61915481 with B1=1800000, B2=TBD [Apr 14 17:46:51] Using FMA3 FFT length 3360K, Pass1=896, Pass2=3840, clm=1, 9 threads [Apr 14 17:46:51] Setting affinity to run helper thread 1 on CPU core #2 [Apr 14 17:46:51] Setting affinity to run helper thread 2 on CPU core #3 [Apr 14 17:46:51] Setting affinity to run helper thread 4 on CPU core #5 [Apr 14 17:46:51] Setting affinity to run helper thread 3 on CPU core #4 [Apr 14 17:46:51] Setting affinity to run helper thread 6 on CPU core #7 [Apr 14 17:46:51] Setting affinity to run helper thread 8 on CPU core #9 [Apr 14 17:46:51] Setting affinity to run helper thread 5 on CPU core #6 [Apr 14 17:46:51] Setting affinity to run helper thread 7 on CPU core #8 [Apr 14 17:46:51] M61915481 stage 1 is 68.8076% complete. <<SNIP>> [Apr 15 09:24:16] M61915481 stage 1 is 99.3425% complete. Time: 264.541 sec. [Apr 15 09:25:01] M61915481 stage 1 complete. 234142 transforms. Total time: 309.353 sec. [Apr 15 09:25:23] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 22.387 sec. [Apr 15 09:25:28] Available memory is 81872MB. [Apr 15 09:25:29] Setting affinity to run helper thread 1 on CPU core #2 [Apr 15 09:25:29] Setting affinity to run helper thread 5 on CPU core #6 [Apr 15 09:25:29] Setting affinity to run helper thread 2 on CPU core #3 [Apr 15 09:25:29] Setting affinity to run helper thread 4 on CPU core #5 [Apr 15 09:25:29] Setting affinity to run helper thread 6 on CPU core #7 [Apr 15 09:25:29] Setting affinity to run helper thread 3 on CPU core #4 [Apr 15 09:25:29] Setting affinity to run helper thread 7 on CPU core #8 [Apr 15 09:25:29] Setting affinity to run helper thread 8 on CPU core #9 [Apr 15 09:25:29] Switching to FMA3 FFT length 3840K, Pass1=640, Pass2=6K, clm=1, 9 threads [Apr 15 09:25:29] With trial factoring done to 2^74, optimal B2 is 763*B1 = 1373400000. [Apr 15 09:25:29] If no prior P-1, chance of a new factor is 8.87% [Apr 15 09:25:29] Estimated stage 2 vs. stage 1 runtime ratio: 1.129 [Apr 15 09:25:29] Using 81870MB of memory. D: 5610, 640x2133 polynomial multiplication. [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #2 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #3 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #4 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #5 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #6 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #7 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #8 [Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #9 [Apr 15 09:28:45] Stage 2 init complete. 19039 transforms. Time: 202.138 sec. [Apr 15 09:34:31] M61915481 stage 2 at B2=244848450 [4.0485%]. Time: 345.379 sec. <<SNIP>> [Apr 15 11:42:04] M61915481 stage 2 complete. 1690661 transforms. Total time: 7998.324 sec. [Apr 15 11:42:18] Stage 2 GCD complete. Time: 14.066 sec. [Apr 15 11:42:18] M61915481 completed P-1, B1=1800000, B2=1378971660, Wi4: 1390BDEE Code:
[Apr 15 03:27:49] Optimal P-1 factoring of M114609203 using up to 81920MB of memory. [Apr 15 03:27:49] Assuming no factors below 2^77 and 1.3 primality tests saved if a factor is found. [Apr 15 03:27:49] Optimal bounds are B1=860000, B2=346295000 [Apr 15 03:27:49] Chance of finding a factor is an estimated 5.69% [Apr 15 03:27:49] [Apr 15 03:27:49] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=2, 6 threads [Apr 15 03:27:49] Setting affinity to run helper thread 1 on CPU core #1 [Apr 15 03:27:49] Setting affinity to run helper thread 5 on CPU core #1 [Apr 15 03:27:49] Setting affinity to run helper thread 4 on CPU core #1 [Apr 15 03:27:49] Setting affinity to run helper thread 2 on CPU core #1 [Apr 15 03:27:49] Setting affinity to run helper thread 3 on CPU core #1 [Apr 15 03:47:29] M114609203 stage 1 is 8.0568% complete. Time: 1179.253 sec. <<SNIP>> [Apr 15 07:31:34] M114609203 stage 1 complete. 2481466 transforms. Total time: 14624.451 sec. [Apr 15 07:32:22] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 48.410 sec. [Apr 15 07:32:29] Available memory is 81872MB. [Apr 15 07:32:30] Switching to FMA3 FFT length 6912K, Pass1=1536, Pass2=4608, clm=2, 6 threads [Apr 15 07:32:30] Estimated stage 2 vs. stage 1 runtime ratio: 1.028 [Apr 15 07:32:30] Setting affinity to run helper thread 1 on CPU core #1 [Apr 15 07:32:30] Setting affinity to run helper thread 3 on CPU core #1 [Apr 15 07:32:30] Setting affinity to run helper thread 2 on CPU core #1 [Apr 15 07:32:30] Setting affinity to run helper thread 4 on CPU core #1 [Apr 15 07:32:30] Setting affinity to run helper thread 5 on CPU core #1 [Apr 15 07:32:30] Using 81827MB of memory. D: 2730, 288x1222 polynomial multiplication. [Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1 [Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1 [Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1 [Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1 [Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1 [Apr 15 07:35:13] Stage 2 init complete. 8097 transforms. Time: 170.295 sec. [Apr 15 07:50:48] M114609203 stage 2 at B2=54493530 [7.2625%]. Time: 934.564 sec. |
![]() |
![]() |
![]() |
#526 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
19A116 Posts |
![]()
I think c)
Core i9-10900 is a 10 core device. https://ark.intel.com/content/www/us...-5-20-ghz.html Prime95 may not be designed to "multiplex" real cores between multiple workers versus time of day. Then worker 1 gets the specified count of 9 real cores assigned to at startup. There's only 1 real core left for assignment to worker 2. A simple test to verify that would be to reduce worker 1's core count temporarily and observe whether worker 2 gains the exact same number. Maybe enabling hyperthreads would help you accomplish what you want? (Generally employing 1 thread/core in other than TF is better than both hyperthreads/core.) Or just switch to a compromise of 7? cores always and no hyperthreading enabled in prime95? Or, if that core multiplexing scenario works in other work types, but not polymult, never mind. I don't recall ever trying to oversubscribe and time-share cores among multiple workers. Reading the source code of a recent version might clear things up. Last fiddled with by kriesel on 2022-04-15 at 12:35 |
![]() |
![]() |
![]() |
#527 | |
Jun 2003
14FD16 Posts |
![]() Quote:
You want to run these weird configs, run two instances from different folders. Or make it a single worker for both work types combined and allocate all 10 cores. Last fiddled with by axn on 2022-04-15 at 12:37 |
|
![]() |
![]() |
![]() |
#528 |
Random Account
Aug 2009
41448 Posts |
![]()
The two images below are from v30.8 B12. If I click the "OK" button on the Advanced Settings, then the warning dialog appears. The second images shows what is covered by the warning. It is requesting a value of 1 to 100 which already appears to be satisfied. It must not be aware of the current values.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prime95 beta version 28.4 | Prime95 | Software | 20 | 2014-03-02 02:51 |
Prime95 beta version 28.3 | Prime95 | Software | 68 | 2014-02-23 05:42 |
Prime95 version 27.1 early preview, not-even-close-to-beta release | Prime95 | Software | 126 | 2012-02-09 16:17 |
Beta version 24.12 available | Prime95 | Software | 33 | 2005-06-14 13:19 |
Beta version of PRP | Prime95 | PSearch | 15 | 2004-09-17 19:21 |