mersenneforum.org > Data Thinking out loud about getting under 20M unfactored exponents
 Register FAQ Search Today's Posts Mark Forums Read

 2020-12-24, 05:56 #276 James Heinrich     "James Heinrich" May 2004 ex-Northern Ontario 2·3·5·109 Posts I happened to have a number of other program open at one point, and when v30.4 started up stage2 Windows complained about low memory. This seems to have gotten Prime95 into a "stuck" state, where the worker window says "P-1 stage2 init" but it just sits there at 100% of a single core indefinitely (I noticed it after it had run for 53 minutes getting nowhere). I force-closed Prime95 (it wouldn't close normally) and restarted it, but the hang is reproducible. I've sent George the savefile for debugging.
2020-12-24, 06:11   #277
axn

Jun 2003

22·35·5 Posts

Quote:
 Originally Posted by James Heinrich Yes, I gave it "wrong" values on purpose
You no longer need to do this. Even "Pminus1" will calculate optimal value -- you just need to pick a B1 for it. But giving it the correct TF depth is essential, else the choice would be suboptimal.

Actually, I wanted you to modify the calculator to take into account the improved Stage 2 and give the optimal B1/B2 for given probability / GHzDay target.

Quote:
 Originally Posted by James Heinrich I happened to have a number of other program open at one point, and when v30.4 started up stage2 Windows complained about low memory. This seems to have gotten Prime95 into a "stuck" state, where the worker window says "P-1 stage2 init" but it just sits there at 100% of a single core indefinitely (I noticed it after it had run for 53 minutes getting nowhere). I force-closed Prime95 (it wouldn't close normally) and restarted it, but the hang is reproducible. I've sent George the savefile for debugging.
This sounds almost similar to the "infinite loop" I encountered. Anyway, currently I'm using hard limits per-worker to avoid this.

 2020-12-24, 14:51 #278 Dylan14     "Dylan" Mar 2017 10628 Posts There appears to be a crash when running an assignment that has a save file from a previous version. I've attached a picture of the output prior to it crashing after it completes stage 2 init. Note, this is with the worktodo line Code: Pfactor=,1,2,28009823,-1,70,3 Trying to see if it crashes after a fresh start: nope, seems to work fine at this point. Attached Thumbnails
2020-12-24, 14:57   #279
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

2×3×5×109 Posts

Quote:
 Originally Posted by Dylan14 There appears to be a crash when running an assignment that has a save file from a previous version.
But not necessarily always -- when I upgraded mid-assignment it also said "cannot continue stage 2", added a bit more B1, and then completed stage2 without problem.
(It did get stuck later on a different assignment, as described above, so it's not entirely stable).

 2020-12-25, 00:02 #280 James Heinrich     "James Heinrich" May 2004 ex-Northern Ontario 2×3×5×109 Posts Just mentioning I found my first factor with v30.4: M17840447 has a 75.272-bit (23-digit) factor: 45606749097226437406729 (P-1,B1=404000,B2=27105000)
 2020-12-25, 02:26 #281 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 1CBE16 Posts Win64 version 30.4 build 3: https://www.dropbox.com/s/1nbpfh37tz...win64.zip?dl=0 I fixed 5 bugs found by you folks: 1) Bad checksum for submitting ECM results manually. 2) There were cases where prime95 did not reduce stage 2 memory to conform to current memory settings. 3) Possible infinite loop during ECM stage 2 init (and maybe P-1 too). 4) Rare memory corruption re-figuring a stage 2 plan. 5) Percent complete in stage 2 corrected. I'm not convinced these explain all the undesirable behaviors described by James, nordi, and axn. Give it a try and let know of any troubles. I may not be able to make a linux build until after Christmas.
 2020-12-26, 18:49 #282 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 2×13×283 Posts Linux64 30.4 build 3: https://www.dropbox.com/s/9yadeo8nn9...64.tar.gz?dl=0
2020-12-27, 14:24   #283
nordi

Dec 2016

23×7 Posts

Quote:
 Originally Posted by Prime95 2) There were cases where prime95 did not reduce stage 2 memory to conform to current memory settings.
I tried again on Linux and mprime kept running much longer than before, but was still stopped by the kernel's OOM killer. It used 121GB when configured to use just 50.

One thing I noticed is that Stage 2 init frequently needs a long time (~1 minute instead of 5 seconds):
Quote:
 [Worker #22 Dec 27 13:49] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 59.057 sec. [Worker #30 Dec 27 13:49] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 54.013 sec. [Worker #28 Dec 27 13:50] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 57.211 sec.
I also saw this in previous versions, but only during startup. It makes the impression like the threads were competing/waiting for a global lock. Maybe that waiting time confuses the allocation logic?

2020-12-27, 16:42   #284
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·13·283 Posts

Quote:
 Originally Posted by nordi I tried again on Linux and mprime kept running much longer than before, but was still stopped by the kernel's OOM killer. It used 121GB when configured to use just 50
Can you describe your setup? 32 workers? 50GB memory. Worktodo.txt is? MaxHighMemWorkers?

Can you provide the screen output (say 200 lines of output) at the time the OOM occurred?

I'll try to reproduce on my dinky quad-core.

Last fiddled with by Prime95 on 2020-12-27 at 16:44

2020-12-27, 17:48   #285
nordi

Dec 2016

23×7 Posts

Quote:
 Originally Posted by Prime95 Can you describe your setup? 32 workers? 50GB memory. Worktodo.txt is? MaxHighMemWorkers?
Yes, 32 workers with "Memory=50000" in local.txt. The MaxHighMemWorkers is not set in my config. The machine has 128GB of RAM.
Quote:
 Originally Posted by Prime95 Can you provide the screen output (say 200 lines of output) at the time the OOM occurred?
That and the worktodo were sent via PM.

 2020-12-27, 20:14 #286 ixfd64 Bemusing Prompter     "Danny" Dec 2002 California 5×11×43 Posts I've seen an issue where P-1 continues past 100% in stage 1 until the worker is stopped. However, this has only happened like two times. I have no idea if it's related to any of the above bugs. Last fiddled with by ixfd64 on 2020-12-27 at 20:16

 Similar Threads Thread Thread Starter Forum Replies Last Post jschwar313 GPU to 72 3 2016-01-31 00:50 Batalov Factoring 6 2011-12-27 22:40 jasong jasong 1 2008-11-11 09:43 devarajkandadai Math 4 2007-07-25 03:01 WraithX GMP-ECM 1 2006-03-19 22:16

All times are UTC. The time now is 13:57.

Sat Mar 6 13:57:05 UTC 2021 up 93 days, 10:08, 0 users, load averages: 1.20, 1.29, 1.31