![]() |
|
|
#133 | ||
|
Aug 2020
79*6581e-4;3*2539e-3
2·5·73 Posts |
Quote:
edit: Works very well so far. No crashes and the total memory usage has increased significantly to 28 GB from a previous ~ 14 GB. Or is that caused by the simultaneous switch to a larger exponent? Quote:
Last fiddled with by bur on 2023-01-23 at 07:28 |
||
|
|
|
|
|
#134 |
|
Aug 2020
79*6581e-4;3*2539e-3
73010 Posts |
Update: Unless I'm using very conservative Memory=x settings, it eventually still tries to restart a worker and crashes. So I'll wait for 30.10... :)
|
|
|
|
|
|
#135 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
205716 Posts |
30.10 build 3
PRE-BETA software! Since last build the occasional crash at stage 2 init is fixed. Caveats? Same as previous builds: Save files during ECM stage 2 are still broken (probably a crash) Stage 2 time estimates and optimal B2 bounds could be off Accurate estimates of stage 2 memory consumed may be off Further stage 2 multithreading improvements are needed. Stage 2 is pretty verbose, there's lots of code cleanup in my future. Windows 64-bit: https://mersenne.org/ftp_root/gimps/...10b3.win64.zip Linux 64-bit: https://mersenne.org/ftp_root/gimps/...linux64.tar.gz Last fiddled with by Prime95 on 2023-01-25 at 05:47 |
|
|
|
|
|
#136 |
|
Aug 2020
79*6581e-4;3*2539e-3
73010 Posts |
Unfortunately it still crashes:
Code:
[Worker #12 Jan 25 08:35] Stage 1 complete. 4177752 transforms, 1 modular inverses. Total time: 395.228 sec. [Worker #12 Jan 25 08:35] Available memory is 9095MB. [Worker #12 Jan 25 08:35] Switching to FMA3 FFT length 72K, Pass1=384, Pass2=192, clm=2 [Worker #12 Jan 25 08:35] Optimal B2 is 1562*B1 = 1562000000. Actual B2 will be 1562145585. [Worker #12 Jan 25 08:35] Estimated stage 2 vs. stage 1 runtime ratio: 0.315 [Worker #9 Jan 25 08:35] Restarting worker with new memory settings. [Worker #8 Jan 25 08:35] PolyG built. Time: 2.507 sec. [Worker #4 Jan 25 08:35] M1281979 curve 2 stage 1 at prime 640957 [64.09%]. Time: 109.104 sec. [Worker #7 Jan 25 08:35] PolyG built. Time: 1.653 sec. [Worker #8 Jan 25 08:35] PolyH built. Time: 1.070 sec. Segmentation fault |
|
|
|
|
|
#137 |
|
Dec 2021
24·5 Posts |
I have recently been testing out stage 1 ECM capabilities on GPU using the CGBN-enabled GMP-ECM. When I tried running stage 2, both in GMP-ECM and on mprime to the same bounds, I found that mprime was significantly faster. This was true even when I (think I successfully) built GMP-ECM with gwnum (v29.8b7) linked. I believe I compiled successfully because there was a (small but) noticeable speedup compared to the non-gwnum version of GMP-ECM. It may be worth noting that I used B1 lower on mprime, but I would've expected this to increase the stage 2 time rather than the other way round.
(The other quite likely possibility is that I don't know what I'm doing and so ecm isn't actually using gwnum despite being linked at compiletime.) Does anyone know of a way of using GMP-ECM stage 1 savefiles in mprime for stage 2, or whether such a thing would be feasible? This could presumably be done using a script, but I don't know how the mprime savefiles are formatted to convert from one to the other. I know such an option is available the other way round with gmpecmhook, but I don't recall ever seeing something this way round. EDIT: I think the reason for the discrepancy between gwnum-linked and not-linked was me misremembering system load during the timings - based on the Fgw.c code for GMP-ECM and some testing, it seems that gwnum is used only for stage 1, which explains the timings for stage 2. I guess that would make the above useful, if possible. Last fiddled with by Denial140 on 2023-01-25 at 13:32 |
|
|
|
|
|
#138 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
For now, you'll need to cap each worker's memory limit. This is far from ideal. I'm contemplating a different way to handle a situation like yours. Perhaps 16 workers all run stage 1, when one reaches stage 2 all 16 stop and one worker runs stage 2 using 16 threads. |
|
|
|
|
|
|
#139 | |
|
Aug 2020
79*6581e-4;3*2539e-3
2·5·73 Posts |
I tried capping the memory, it still restarted occasionally. For now I'm simply running ECM on much smaller exponents.
Quote:
If so, couldn't mprime just *not* restart the worker but make do with the available memory? Also, regarding multithreaded ECM, is that efficient for large exponents? I never thought of multithreading ECM, but that's simply because I mostly use it for 150-200 digit numbers. I could just ECM that million bits number with 12 threads in that case - it should prevent restarting, correct? |
|
|
|
|
|
|
#140 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
427710 Posts |
After quick testing, Linux64 seems to get CERT assignments again, thank you.
Win64 seems to start up ok in a fresh unzip, but an in-place upgrade over v30.8b15 instant-crashes (I haven't investigated further). Last fiddled with by James Heinrich on 2023-01-25 at 20:13 |
|
|
|
|
|
#141 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
Long term, stage 2 save files will be supported but take a long time to create. I either have to create save files that are GBs in size or spend several minutes reducing the data down to size (the same as when ECM stage 2 finishes). Yes, you can run ECM on multi-million bit numbers with 1 worker and 12 threads. |
|
|
|
|
|
|
#142 | |
|
Aug 2020
79*6581e-4;3*2539e-3
2DA16 Posts |
Quote:
Is it possible to disable saving in stage 2? At least for the time being? I'll go with the multithreading for now. Are there any guidelines as to how many threads are resulting in the highest throughput? (I know I need exactly 1 worker with 12 threads in my case anyway, but I'd like to know if it's inefficient or not) Last fiddled with by bur on 2023-01-26 at 06:56 |
|
|
|
|
|
|
#143 | |
|
Sep 2022
53 Posts |
Quote:
As a side note, it was mentioned with these builds that for now the runtime estimation and RAM B2 selection is probably inaccurate, and for me it generally overestimates stage 2 runtime. It pegs it at about 0.3x stage 1 but in reality it is closer to 0.1x stage 1. Perhaps I could manually set a B2 value roughly triple what it recommends to put it at 0.3x, but I don't know what runtime is optimal for factor chance/throughput. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| That's a Lot of Users!!! | jinydu | Lounge | 9 | 2006-11-10 00:14 |
| Beta version 24.6 - Athlon users wanted | Prime95 | Software | 139 | 2005-03-30 12:13 |
| For Old Users | Citrix | Prime Sierpinski Project | 15 | 2004-08-22 16:43 |
| Opportunity! Retaining new users post-M40 | GP2 | Lounge | 55 | 2003-11-21 21:08 |
| AMD USERS | ET_ | Lounge | 3 | 2003-10-11 16:52 |