mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Thinking out loud about getting under 20M unfactored exponents (https://www.mersenneforum.org/showthread.php?t=22476)

James Heinrich 2020-12-24 05:56

I happened to have a number of other program open at one point, and when v30.4 started up stage2 Windows complained about low memory. This seems to have gotten Prime95 into a "stuck" state, where the worker window says "P-1 stage2 init" but it just sits there at 100% of a single core indefinitely (I noticed it after it had run for 53 minutes getting nowhere). I force-closed Prime95 (it wouldn't close normally) and restarted it, but the hang is reproducible. I've sent George the savefile for debugging.

axn 2020-12-24 06:11

[QUOTE=James Heinrich;567204]Yes, I gave it "wrong" values on purpose[/quote]
You no longer need to do this. Even "Pminus1" will calculate optimal value -- you just need to pick a B1 for it. But giving it the correct TF depth is essential, else the choice would be suboptimal.

Actually, I wanted you to modify the calculator to take into account the improved Stage 2 and give the optimal B1/B2 for given probability / GHzDay target.

[QUOTE=James Heinrich;567205]I happened to have a number of other program open at one point, and when v30.4 started up stage2 Windows complained about low memory. This seems to have gotten Prime95 into a "stuck" state, where the worker window says "P-1 stage2 init" but it just sits there at 100% of a single core indefinitely (I noticed it after it had run for 53 minutes getting nowhere). I force-closed Prime95 (it wouldn't close normally) and restarted it, but the hang is reproducible. I've sent George the savefile for debugging.[/QUOTE]
This sounds almost similar to the "infinite loop" I encountered. Anyway, currently I'm using hard limits per-worker to avoid this.

Dylan14 2020-12-24 14:51

1 Attachment(s)
There appears to be a crash when running an assignment that has a save file from a previous version. I've attached a picture of the output prior to it crashing after it completes stage 2 init.

Note, this is with the worktodo line

[code]Pfactor=<aid>,1,2,28009823,-1,70,3[/code]Trying to see if it crashes after a fresh start: nope, seems to work fine at this point.

James Heinrich 2020-12-24 14:57

[QUOTE=Dylan14;567222]There appears to be a crash when running an assignment that has a save file from a previous version.[/QUOTE]But not necessarily always -- when I upgraded mid-assignment it also said "cannot continue stage 2", added a bit more B1, and then completed stage2 without problem.
(It did get stuck later on a different assignment, as described above, so it's not entirely stable).

James Heinrich 2020-12-25 00:02

Just mentioning I found my first factor with v30.4:
[M]M17840447[/M] has a 75.272-bit (23-digit) factor: [url=https://www.mersenne.ca/M17840447]45606749097226437406729[/url] (P-1,B1=404000,B2=27105000)

Prime95 2020-12-25 02:26

Win64 version 30.4 build 3: [url]https://www.dropbox.com/s/1nbpfh37tzd57gb/p95v304b3.win64.zip?dl=0[/url]

I fixed 5 bugs found by you folks:
1) Bad checksum for submitting ECM results manually.
2) There were cases where prime95 did not reduce stage 2 memory to conform to current memory settings.
3) Possible infinite loop during ECM stage 2 init (and maybe P-1 too).
4) Rare memory corruption re-figuring a stage 2 plan.
5) Percent complete in stage 2 corrected.

I'm not convinced these explain all the undesirable behaviors described by James, nordi, and axn. Give it a try and let know of any troubles.

I may not be able to make a linux build until after Christmas.

Prime95 2020-12-26 18:49

Linux64 30.4 build 3: [url]https://www.dropbox.com/s/9yadeo8nn9aeajw/p95v304b3.linux64.tar.gz?dl=0[/url]

nordi 2020-12-27 14:24

[QUOTE=Prime95;567284]
2) There were cases where prime95 did not reduce stage 2 memory to conform to current memory settings.
[/QUOTE]
I tried again on Linux and mprime kept running much longer than before, but was still stopped by the kernel's OOM killer. It used 121GB when configured to use just 50.


One thing I noticed is that Stage 2 init frequently needs a long time (~1 minute instead of 5 seconds):
[quote]
[Worker #22 Dec 27 13:49] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 59.057 sec.
[Worker #30 Dec 27 13:49] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 54.013 sec.
[Worker #28 Dec 27 13:50] Stage 2 init complete. 62496 transforms, 1 modular inverses. Time: 57.211 sec.
[/quote]I also saw this in previous versions, but only during startup. It makes the impression like the threads were competing/waiting for a global lock. Maybe that waiting time confuses the allocation logic?

Prime95 2020-12-27 16:42

[QUOTE=nordi;567446]I tried again on Linux and mprime kept running much longer than before, but was still stopped by the kernel's OOM killer. It used 121GB when configured to use just 50
[/QUOTE]

Can you describe your setup? 32 workers? 50GB memory. Worktodo.txt is? MaxHighMemWorkers?

Can you provide the screen output (say 200 lines of output) at the time the OOM occurred?

I'll try to reproduce on my dinky quad-core.

nordi 2020-12-27 17:48

[QUOTE=Prime95;567454]Can you describe your setup? 32 workers? 50GB memory. Worktodo.txt is? MaxHighMemWorkers?
[/QUOTE]
Yes, 32 workers with "Memory=50000" in local.txt. The MaxHighMemWorkers is not set in my config. The machine has 128GB of RAM.
[QUOTE=Prime95;567454]
Can you provide the screen output (say 200 lines of output) at the time the OOM occurred?[/QUOTE]That and the worktodo were sent via PM.

ixfd64 2020-12-27 20:14

I've seen an issue where P-1 continues past 100% in stage 1 until the worker is stopped. However, this has only happened like two times. I have no idea if it's related to any of the above bugs.


All times are UTC. The time now is 22:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.