mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-04-09, 02:10   #518
Zhangrc
 
"University student"
May 2021
Beijing, China

22·5·13 Posts
Default

@George:
Could you allow "Minutes between disk saves" be adjusted to less than 10?
I have several colab instances which terminate randomly and the time is limited. However, I see no restrictions on disk writing frequencies.
I have always been planning to change the number to 1 or 2, because when the instance is terminated, only 1 or 2 minutes of work is lost, rather than 10.
Zhangrc is offline   Reply With Quote
Old 2022-04-09, 03:24   #519
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

53·149 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
@George:
Could you allow "Minutes between disk saves" be adjusted to less than 10?
I have several colab instances which terminate randomly and the time is limited. However, I see no restrictions on disk writing frequencies.
I have always been planning to change the number to 1 or 2, because when the instance is terminated, only 1 or 2 minutes of work is lost, rather than 10.
You can set it to less than 10 in prime.txt. However, be aware that a disk write is not free. A not insignificant CPU time is spent converting numbers to binary and writing to disk.
Prime95 is offline   Reply With Quote
Old 2022-04-09, 15:36   #520
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

214810 Posts
Default

I ran into problems with v30.8 B12 yesterday evening. I wanted to see how it would handle a small ECM:

Code:
ECM2=N/A,1,2,4363,-1,110000000,11000000000,5
Being a bit familiar with M4363, I used it. I followed the standard convention of B2 = B1*100. I had the Stage 2 RAM set to 4GB. As soon as Prime95 got to Stage 2, it unloaded itself. I lowered the RAM to 3GB. Same result. This morning, I looked through Windows Event Logs and found the following:

Code:
Faulting application name: prime95.exe, version: 30.8.1.0, time stamp: 0x62421166
Faulting module name: prime95.exe, version: 30.8.1.0, time stamp: 0x62421166
Exception code: 0xc0000005
Fault offset: 0x000000000201b0cc
Faulting process id: 0x6ac
Faulting application start time: 0x01d84bb4cd2bb5d0
Faulting application path: C:\Prime95\prime95.exe
Faulting module path: C:\Prime95\prime95.exe
Report Id: 23d3ab23-209a-4c26-98eb-95f4dbf56784
Faulting package full name: 
Faulting package-relative application ID:
Everything above I ran on my primary system. i7-7700, 16GB RAM, Windows 10 v21H1 and so on. I decided to see if I could replicate the issue on a totally different machine. Xeon E5-1620, 8GB RAM and Windows 10 v21H1. Instead of unloading itself at the end of Stage 1, Prime95 froze as soon as it loaded. The Xeon system has been running P-1's with 30.8 for quite a few days with no problems.

Perhaps these two systems are no longer compatible with the ECM code. 29.x runs this without problems. I have a third system which run Windows 7. It's not often that I start it for anything. I put 30.8 on it. It started alright with the ECM I quoted above. I set the RAM at 3 GB. This will be a really long process for that machine.

Edit: The run on the i5 (Windows 7) system also failed. A "program not responding" dialog appeared.

Last fiddled with by storm5510 on 2022-04-09 at 16:20
storm5510 is offline   Reply With Quote
Old 2022-04-12, 10:38   #521
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

3·73 Posts
Default

Two observations:
  1. The exponent M43283 used a 2048 byte FFT, while M44633 used an 4608 byte FFT (ECM, stage 1 only, AVX-512). The second FFT is three to four times slower since it is more than double the size of the former. Is there no FFT size in between, e.g. 3072 or 4096?
  2. I am not able to get large page allocation to work (on Windows). I already execute Prime95 as administrator. Prime95 says the memory gets allocated as large pages, but Task Manager states otherwise. Putting memory pressure on the system, Prime95's memory gets written to disk when it should not when using large pages.
kruoli is offline   Reply With Quote
Old 2022-04-12, 16:52   #522
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

53×149 Posts
Default

Quote:
Originally Posted by kruoli View Post
Two observations:
  1. The exponent M43283 used a 2048 byte FFT, while M44633 used an 4608 byte FFT (ECM, stage 1 only, AVX-512). The second FFT is three to four times slower since it is more than double the size of the former. Is there no FFT size in between, e.g. 3072 or 4096?
  2. I am not able to get large page allocation to work (on Windows). I already execute Prime95 as administrator. Prime95 says the memory gets allocated as large pages, but Task Manager states otherwise. Putting memory pressure on the system, Prime95's memory gets written to disk when it should not when using large pages.
There are fewer small FFT sizes available for AVX-512. Try adding CpuSupportsAVX512F=0 to local.txt.

I vaguely recall that large pages under Windows also required some registry settings. It's been many years since I wrote that code so I can't help much. I do remember that the procedure was so complex that I gave up on mainstream Windows users ever being able to use the feature.

Linux large pages is much easier. However, performance increase, if any, is not measurable.
Prime95 is offline   Reply With Quote
Old 2022-04-13, 11:42   #523
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

3×73 Posts
Default

With the new stage 2 for P-1, I think your buffer handling must have changed more than I originally thought. I investigated large pages a bit further. I copied your code in gwutil.c and found that it would correctly allocate large pages on a Windows 10 system without special settings and even without administrator rights.
So I double-checked the memory allocation of stage 2 in Task Manager again, and then it hit me: Half of the memory is allocated in large pages. The other half seems to be allocated differently. So maybe this is something that is fixable? It should also occur on Linux this way, but I have not yet checked this.
Why I want to use large pages: Some Windows "function" likes to push out around 4-6 GB of memory into swap in regular intervals, although the freed memory is never used after that (so the swapping-out is not required (at least in this size), but still happening). Having now some buffers on disk, stage 2 needs half an hour to multiple hours to recover. I wanted to avoid this by locking the pages in memory.
I also saw that you large pages are normally only aligned to 64 bits since you write their size in those bits. But this is never used. Maybe removing this would increase the gains of using them?
kruoli is offline   Reply With Quote
Old 2022-04-13, 15:57   #524
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

53·149 Posts
Default

Quote:
Originally Posted by kruoli View Post
With the new stage 2 for P-1, I think your buffer handling must have changed more than I originally thought. I investigated large pages a bit further. I copied your code in gwutil.c and found that it would correctly allocate large pages on a Windows 10 system without special settings and even without administrator rights.
So I double-checked the memory allocation of stage 2 in Task Manager again, and then it hit me: Half of the memory is allocated in large pages. The other half seems to be allocated differently. So maybe this is something that is fixable? It should also occur on Linux this way, but I have not yet checked this.
Interesting that Windows implementation of large pages has improved.

The bulk of P-1 stage 2 memory allocation is for the two polynomials. These are allocated using gwalloc_array. Gwalloc_array tries to do a large page allocate. If that fails gwalloc_array falls back to a regular malloc. From your description it seems that one polynomial is successfully allocated using large pages, the other fails and uses regular malloc. It seems likely that Windows places a limit on the amount of memory that can be allocated using large pages. This makes sense as large pages are slow/hard/impossible to swap out. By limiting the amount of large page memory Windows will always have an adequate supply of small pages to swap out when needed. This is all speculation on my part.
Prime95 is offline   Reply With Quote
Old 2022-04-15, 11:20   #525
Reboot It
 
Reboot It's Avatar
 
Aug 2002
London, UK

22×23 Posts
Default Not always using multiple CPU cores

Why do polymult helper threads sometimes run on all different CPU cores, and sometimes run all on one CPU core?

Setup: v30.8 build 14, Core i9-10900 2.80GHz, 96GB RAM, 80GB allocated to Prime95 (day and night)

Worker 1: Running "Pminus1=" workload in 61M range, using 9 cores.
Stage 1 uses FMA3 FFT length 3360K.
Stage 2 uses FMA3 FFT length 3840K.

Worker 2: Running wavefront "Pfactor=" workload in 114M range, using 6 cores.
Stage 1 uses FMA3 FFT length 6M.
Stage 2 uses FMA3 FFT length 6912K.

These two workers are scheduled to run at different times of the day and not overlap: one always stops before the other one starts.

local.txt [Worker #1] section contains "CoresPerTest=9" and [Worker #2] section contains "CoresPerTest=6". These were chosen specifically to optimise throughput.

Worker 1 always uses separate cores, whereas Worker 2 only ever uses one core. You would think that using multiple cores would always be preferable to using just one core for best throughput.

Can someone more knowledgeable of P95's inner workings please advise: is this because of:
(a) different work type (Pminus1 vs Pfactor)
(b) different FFT sizes
(c) some other factor?


Sample output from 61M Worker 1:
Code:
[Apr 14 17:46:50] Worker starting
[Apr 14 17:46:50] Setting affinity to run worker on CPU core #1
[Apr 14 17:46:50] 
[Apr 14 17:46:50] P-1 on M61915481 with B1=1800000, B2=TBD
[Apr 14 17:46:51] Using FMA3 FFT length 3360K, Pass1=896, Pass2=3840, clm=1, 9 threads
[Apr 14 17:46:51] Setting affinity to run helper thread 1 on CPU core #2
[Apr 14 17:46:51] Setting affinity to run helper thread 2 on CPU core #3
[Apr 14 17:46:51] Setting affinity to run helper thread 4 on CPU core #5
[Apr 14 17:46:51] Setting affinity to run helper thread 3 on CPU core #4
[Apr 14 17:46:51] Setting affinity to run helper thread 6 on CPU core #7
[Apr 14 17:46:51] Setting affinity to run helper thread 8 on CPU core #9
[Apr 14 17:46:51] Setting affinity to run helper thread 5 on CPU core #6
[Apr 14 17:46:51] Setting affinity to run helper thread 7 on CPU core #8
[Apr 14 17:46:51] M61915481 stage 1 is 68.8076% complete.
<<SNIP>>
[Apr 15 09:24:16] M61915481 stage 1 is 99.3425% complete. Time: 264.541 sec.
[Apr 15 09:25:01] M61915481 stage 1 complete. 234142 transforms. Total time: 309.353 sec.
[Apr 15 09:25:23] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 22.387 sec.
[Apr 15 09:25:28] Available memory is 81872MB.
[Apr 15 09:25:29] Setting affinity to run helper thread 1 on CPU core #2
[Apr 15 09:25:29] Setting affinity to run helper thread 5 on CPU core #6
[Apr 15 09:25:29] Setting affinity to run helper thread 2 on CPU core #3
[Apr 15 09:25:29] Setting affinity to run helper thread 4 on CPU core #5
[Apr 15 09:25:29] Setting affinity to run helper thread 6 on CPU core #7
[Apr 15 09:25:29] Setting affinity to run helper thread 3 on CPU core #4
[Apr 15 09:25:29] Setting affinity to run helper thread 7 on CPU core #8
[Apr 15 09:25:29] Setting affinity to run helper thread 8 on CPU core #9
[Apr 15 09:25:29] Switching to FMA3 FFT length 3840K, Pass1=640, Pass2=6K, clm=1, 9 threads
[Apr 15 09:25:29] With trial factoring done to 2^74, optimal B2 is 763*B1 = 1373400000.
[Apr 15 09:25:29] If no prior P-1, chance of a new factor is 8.87%
[Apr 15 09:25:29] Estimated stage 2 vs. stage 1 runtime ratio: 1.129
[Apr 15 09:25:29] Using 81870MB of memory.  D: 5610, 640x2133 polynomial multiplication.
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #2
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #3
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #4
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #5
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #6
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #7
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #8
[Apr 15 09:25:38] Setting affinity to run polymult helper thread on CPU core #9
[Apr 15 09:28:45] Stage 2 init complete. 19039 transforms. Time: 202.138 sec.
[Apr 15 09:34:31] M61915481 stage 2 at B2=244848450 [4.0485%].  Time: 345.379 sec.
<<SNIP>>
[Apr 15 11:42:04] M61915481 stage 2 complete. 1690661 transforms. Total time: 7998.324 sec.
[Apr 15 11:42:18] Stage 2 GCD complete. Time: 14.066 sec.
[Apr 15 11:42:18] M61915481 completed P-1, B1=1800000, B2=1378971660, Wi4: 1390BDEE
Sample output from 114M Worker 2:
Code:
[Apr 15 03:27:49] Optimal P-1 factoring of M114609203 using up to 81920MB of memory.
[Apr 15 03:27:49] Assuming no factors below 2^77 and 1.3 primality tests saved if a factor is found.
[Apr 15 03:27:49] Optimal bounds are B1=860000, B2=346295000
[Apr 15 03:27:49] Chance of finding a factor is an estimated 5.69%
[Apr 15 03:27:49] 
[Apr 15 03:27:49] Using FMA3 FFT length 6M, Pass1=1536, Pass2=4K, clm=2, 6 threads
[Apr 15 03:27:49] Setting affinity to run helper thread 1 on CPU core #1
[Apr 15 03:27:49] Setting affinity to run helper thread 5 on CPU core #1
[Apr 15 03:27:49] Setting affinity to run helper thread 4 on CPU core #1
[Apr 15 03:27:49] Setting affinity to run helper thread 2 on CPU core #1
[Apr 15 03:27:49] Setting affinity to run helper thread 3 on CPU core #1
[Apr 15 03:47:29] M114609203 stage 1 is 8.0568% complete. Time: 1179.253 sec.
<<SNIP>>
[Apr 15 07:31:34] M114609203 stage 1 complete. 2481466 transforms. Total time: 14624.451 sec.
[Apr 15 07:32:22] Inversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 48.410 sec.
[Apr 15 07:32:29] Available memory is 81872MB.
[Apr 15 07:32:30] Switching to FMA3 FFT length 6912K, Pass1=1536, Pass2=4608, clm=2, 6 threads
[Apr 15 07:32:30] Estimated stage 2 vs. stage 1 runtime ratio: 1.028
[Apr 15 07:32:30] Setting affinity to run helper thread 1 on CPU core #1
[Apr 15 07:32:30] Setting affinity to run helper thread 3 on CPU core #1
[Apr 15 07:32:30] Setting affinity to run helper thread 2 on CPU core #1
[Apr 15 07:32:30] Setting affinity to run helper thread 4 on CPU core #1
[Apr 15 07:32:30] Setting affinity to run helper thread 5 on CPU core #1
[Apr 15 07:32:30] Using 81827MB of memory.  D: 2730, 288x1222 polynomial multiplication.
[Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1
[Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1
[Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1
[Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1
[Apr 15 07:32:41] Setting affinity to run polymult helper thread on CPU core #1
[Apr 15 07:35:13] Stage 2 init complete. 8097 transforms. Time: 170.295 sec.
[Apr 15 07:50:48] M114609203 stage 2 at B2=54493530 [7.2625%].  Time: 934.564 sec.
Reboot It is offline   Reply With Quote
Old 2022-04-15, 12:23   #526
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19A116 Posts
Default

I think c)
Core i9-10900 is a 10 core device. https://ark.intel.com/content/www/us...-5-20-ghz.html
Prime95 may not be designed to "multiplex" real cores between multiple workers versus time of day.
Then worker 1 gets the specified count of 9 real cores assigned to at startup. There's only 1 real core left for assignment to worker 2.
A simple test to verify that would be to reduce worker 1's core count temporarily and observe whether worker 2 gains the exact same number.

Maybe enabling hyperthreads would help you accomplish what you want? (Generally employing 1 thread/core in other than TF is better than both hyperthreads/core.)
Or just switch to a compromise of 7? cores always and no hyperthreading enabled in prime95?

Or, if that core multiplexing scenario works in other work types, but not polymult, never mind. I don't recall ever trying to oversubscribe and time-share cores among multiple workers.

Reading the source code of a recent version might clear things up.

Last fiddled with by kriesel on 2022-04-15 at 12:35
kriesel is offline   Reply With Quote
Old 2022-04-15, 12:36   #527
axn
 
axn's Avatar
 
Jun 2003

14FD16 Posts
Default

Quote:
Originally Posted by Reboot It View Post
Setup: v30.8 build 14, Core i9-10900 2.80GHz, 96GB RAM, 80GB allocated to Prime95 (day and night)
You have a 10 core CPU. How exactly is P95 supposed to allocate 9+6 cores? The workers may run at different time, but P95 doesn't know that.

You want to run these weird configs, run two instances from different folders. Or make it a single worker for both work types combined and allocate all 10 cores.

Last fiddled with by axn on 2022-04-15 at 12:37
axn is offline   Reply With Quote
Old 2022-04-18, 18:34   #528
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

41448 Posts
Default

The two images below are from v30.8 B12. If I click the "OK" button on the Advanced Settings, then the warning dialog appears. The second images shows what is covered by the warning. It is requesting a value of 1 to 100 which already appears to be satisfied. It must not be aware of the current values.
Attached Thumbnails
Click image for larger version

Name:	p95.JPG
Views:	46
Size:	38.2 KB
ID:	26779   Click image for larger version

Name:	p95a.JPG
Views:	46
Size:	41.5 KB
ID:	26780  
storm5510 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 beta version 28.4 Prime95 Software 20 2014-03-02 02:51
Prime95 beta version 28.3 Prime95 Software 68 2014-02-23 05:42
Prime95 version 27.1 early preview, not-even-close-to-beta release Prime95 Software 126 2012-02-09 16:17
Beta version 24.12 available Prime95 Software 33 2005-06-14 13:19
Beta version of PRP Prime95 PSearch 15 2004-09-17 19:21

All times are UTC. The time now is 08:11.


Sun Jun 26 08:11:05 UTC 2022 up 73 days, 6:12, 1 user, load averages: 2.27, 1.50, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔