mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
Thread Tools
Old 2023-01-23, 06:59   #133
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

2·5·73 Posts
Default

Quote:
Originally Posted by Rubiksmath
fiddling around with the memory each worker was allowed individually
Sounds like a nice workaround. And allocating fixed amounts of memory seems like a good idea anyway if you do a bunch of similar work, doesn't it?

edit: Works very well so far. No crashes and the total memory usage has increased significantly to 28 GB from a previous ~ 14 GB. Or is that caused by the simultaneous switch to a larger exponent?

Quote:
Originally Posted by Prime95
This may be the bug LaurV reported and is fixed in 30.10. I'll try to get a 30.10 mprime built for you
Thanks.

Last fiddled with by bur on 2023-01-23 at 07:28
bur is offline   Reply With Quote
Old 2023-01-23, 08:49   #134
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

73010 Posts
Default

Update: Unless I'm using very conservative Memory=x settings, it eventually still tries to restart a worker and crashes. So I'll wait for 30.10... :)
bur is offline   Reply With Quote
Old 2023-01-25, 05:46   #135
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

205716 Posts
Default

30.10 build 3

PRE-BETA software!

Since last build the occasional crash at stage 2 init is fixed.

Caveats? Same as previous builds:
Save files during ECM stage 2 are still broken (probably a crash)
Stage 2 time estimates and optimal B2 bounds could be off
Accurate estimates of stage 2 memory consumed may be off
Further stage 2 multithreading improvements are needed.
Stage 2 is pretty verbose, there's lots of code cleanup in my future.

Windows 64-bit: https://mersenne.org/ftp_root/gimps/...10b3.win64.zip
Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz

Last fiddled with by Prime95 on 2023-01-25 at 05:47
Prime95 is offline   Reply With Quote
Old 2023-01-25, 08:52   #136
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

73010 Posts
Default

Unfortunately it still crashes:

Code:
[Worker #12 Jan 25 08:35] Stage 1 complete. 4177752 transforms, 1 modular inverses. Total time: 395.228 sec.
[Worker #12 Jan 25 08:35] Available memory is 9095MB.
[Worker #12 Jan 25 08:35] Switching to FMA3 FFT length 72K, Pass1=384, Pass2=192, clm=2
[Worker #12 Jan 25 08:35] Optimal B2 is 1562*B1 = 1562000000.  Actual B2 will be 1562145585.
[Worker #12 Jan 25 08:35] Estimated stage 2 vs. stage 1 runtime ratio: 0.315
[Worker #9 Jan 25 08:35] Restarting worker with new memory settings.
[Worker #8 Jan 25 08:35] PolyG built.  Time: 2.507 sec.
[Worker #4 Jan 25 08:35] M1281979 curve 2 stage 1 at prime 640957 [64.09%]. Time: 109.104 sec.
[Worker #7 Jan 25 08:35] PolyG built.  Time: 1.653 sec.
[Worker #8 Jan 25 08:35] PolyH built.  Time: 1.070 sec.
Segmentation fault
I tried both with and without Memory=xxxx restriction.
bur is offline   Reply With Quote
Old 2023-01-25, 13:03   #137
Denial140
 
Dec 2021

24·5 Posts
Default

I have recently been testing out stage 1 ECM capabilities on GPU using the CGBN-enabled GMP-ECM. When I tried running stage 2, both in GMP-ECM and on mprime to the same bounds, I found that mprime was significantly faster. This was true even when I (think I successfully) built GMP-ECM with gwnum (v29.8b7) linked. I believe I compiled successfully because there was a (small but) noticeable speedup compared to the non-gwnum version of GMP-ECM. It may be worth noting that I used B1 lower on mprime, but I would've expected this to increase the stage 2 time rather than the other way round.
(The other quite likely possibility is that I don't know what I'm doing and so ecm isn't actually using gwnum despite being linked at compiletime.)

Does anyone know of a way of using GMP-ECM stage 1 savefiles in mprime for stage 2, or whether such a thing would be feasible? This could presumably be done using a script, but I don't know how the mprime savefiles are formatted to convert from one to the other. I know such an option is available the other way round with gmpecmhook, but I don't recall ever seeing something this way round.


EDIT: I think the reason for the discrepancy between gwnum-linked and not-linked was me misremembering system load during the timings - based on the Fgw.c code for GMP-ECM and some testing, it seems that gwnum is used only for stage 1, which explains the timings for stage 2. I guess that would make the above useful, if possible.

Last fiddled with by Denial140 on 2023-01-25 at 13:32
Denial140 is offline   Reply With Quote
Old 2023-01-25, 18:00   #138
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by bur View Post
Unfortunately it still crashes:

Code:
[Worker #9 Jan 25 08:35] Restarting worker with new memory settings.
This scares me. Restarting means writing a save file. If worker #9 is in stage 2 that could easily lead to a crash.

For now, you'll need to cap each worker's memory limit. This is far from ideal. I'm contemplating a different way to handle a situation like yours. Perhaps 16 workers all run stage 1, when one reaches stage 2 all 16 stop and one worker runs stage 2 using 16 threads.
Prime95 is offline   Reply With Quote
Old 2023-01-25, 18:50   #139
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

2·5·73 Posts
Default

I tried capping the memory, it still restarted occasionally. For now I'm simply running ECM on much smaller exponents.

Quote:
This is far from ideal. I'm contemplating a different way to handle a situation like yours.
This sounds like it's not possible to prevent the crash when a worker is restarted during stage 2. Is it really like that?

If so, couldn't mprime just *not* restart the worker but make do with the available memory?

Also, regarding multithreaded ECM, is that efficient for large exponents? I never thought of multithreading ECM, but that's simply because I mostly use it for 150-200 digit numbers. I could just ECM that million bits number with 12 threads in that case - it should prevent restarting, correct?
bur is offline   Reply With Quote
Old 2023-01-25, 20:13   #140
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

427710 Posts
Default

Quote:
Originally Posted by Prime95 View Post
30.10 build 3
After quick testing, Linux64 seems to get CERT assignments again, thank you.
Win64 seems to start up ok in a fresh unzip, but an in-place upgrade over v30.8b15 instant-crashes (I haven't investigated further).

Last fiddled with by James Heinrich on 2023-01-25 at 20:13
James Heinrich is offline   Reply With Quote
Old 2023-01-25, 20:15   #141
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by bur View Post
I tried capping the memory, it still restarted occasionally. For now I'm simply running ECM on much smaller exponents.

This sounds like it's not possible to prevent the crash when a worker is restarted during stage 2. Is it really like that?

If so, couldn't mprime just *not* restart the worker but make do with the available memory?

Also, regarding multithreaded ECM, is that efficient for large exponents? I never thought of multithreading ECM, but that's simply because I mostly use it for 150-200 digit numbers. I could just ECM that million bits number with 12 threads in that case - it should prevent restarting, correct?
The problem is not restarting (well, maybe if saving during stage 2 got far enough to create a corrupt save file). The crash is likely in creating the save file during stage 2.

Long term, stage 2 save files will be supported but take a long time to create. I either have to create save files that are GBs in size or spend several minutes reducing the data down to size (the same as when ECM stage 2 finishes).

Yes, you can run ECM on multi-million bit numbers with 1 worker and 12 threads.
Prime95 is offline   Reply With Quote
Old 2023-01-26, 06:54   #142
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

2DA16 Posts
Default

Quote:
The crash is likely in creating the save file during stage 2
Why does the problem only manifest when I see a restart? I thought a save file was generated even without restarting.

Is it possible to disable saving in stage 2? At least for the time being?

I'll go with the multithreading for now. Are there any guidelines as to how many threads are resulting in the highest throughput?

(I know I need exactly 1 worker with 12 threads in my case anyway, but I'd like to know if it's inefficient or not)

Last fiddled with by bur on 2023-01-26 at 06:56
bur is offline   Reply With Quote
Old 2023-01-26, 10:01   #143
Rubiksmath
 
Sep 2022

53 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Perhaps 16 workers all run stage 1, when one reaches stage 2 all 16 stop and one worker runs stage 2 using 16 threads.
I quite like this idea, this would likely prevent crashes and it would give stage 2 a nice boost. Yes it would (probably) lose throughput (unless the stage 2 is boosted more than I am thinking of) but unlike P-1 stage 2 ECM stage 2 takes up very little time compared to stage 1 and that is why I like this idea, you won't get as big of a backlog as you do with P-1 stage 2 if you do it in this fashion.

As a side note, it was mentioned with these builds that for now the runtime estimation and RAM B2 selection is probably inaccurate, and for me it generally overestimates stage 2 runtime. It pegs it at about 0.3x stage 1 but in reality it is closer to 0.1x stage 1. Perhaps I could manually set a B2 value roughly triple what it recommends to put it at 0.3x, but I don't know what runtime is optimal for factor chance/throughput.
Rubiksmath is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
That's a Lot of Users!!! jinydu Lounge 9 2006-11-10 00:14
Beta version 24.6 - Athlon users wanted Prime95 Software 139 2005-03-30 12:13
For Old Users Citrix Prime Sierpinski Project 15 2004-08-22 16:43
Opportunity! Retaining new users post-M40 GP2 Lounge 55 2003-11-21 21:08
AMD USERS ET_ Lounge 3 2003-10-11 16:52

All times are UTC. The time now is 14:13.


Fri Jul 7 14:13:50 UTC 2023 up 323 days, 11:42, 0 users, load averages: 2.09, 1.59, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔