mersenneforum.org ECM users - version 30.9/30.10 (see post#168)
 Register FAQ Search Today's Posts Mark Forums Read

2022-07-04, 08:22   #12
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

23×1,021 Posts

Quote:
 Originally Posted by Glenn I had a single round off error over 0.4 on a recent test for M113782777, but successfully completed that test anyway. Since then my tests (two more so far) include a “round off:” column in my results for each group of 10,000 iterations. Please let me know when it’s safe to use 30.9.
The single roundoff error above 0.4 is completely normal. Continue to use 30.8.

 2022-07-04, 09:29 #13 Glenn   "Glenn Leider" Apr 2021 Carlsbad, CA 338 Posts Okay, will continue to use the latest release of 30.8 for now. Thanks for the quick response.
 2022-07-04, 11:17 #14 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 2·11·61 Posts Some example of one of my machines (Intel Atom): Code: [Worker #2 Jul 4 12:50] Stage 1 complete. 1910782 transforms, 1 modular inverses. Total time: 1122.019 sec. [Worker #2 Jul 4 12:50] Available memory is 4092MB. [Worker #2 Jul 4 12:50] Optimal B2 is 1017*B1 = 254250000. Actual B2 will be 254268105. [Worker #2 Jul 4 12:50] Estimated stage 2 vs. stage 1 runtime ratio: 0.357 [Worker #2 Jul 4 12:50] Setting affinity to run helper thread 1 on CPU core #4 [Worker #2 Jul 4 12:50] Using 3932MB of memory. D: 19110, degree-2016 polynomials. Ftree polys in memory: 2 [Worker #2 Jul 4 12:50] Setting affinity to run polymult helper thread on CPU core #4 [Worker #2 Jul 4 12:52] Stage 2 init complete. 123530 transforms, 1 modular inverses. Time: 120.117 sec. [Worker #2 Jul 4 12:53] PolyG built. Time: 67.360 sec. [Worker #2 Jul 4 12:53] M675347 stage 2 at B2=61639305 [16.66%]. Time: 0.000 sec. [Worker #2 Jul 4 12:54] PolyG built. Time: 67.692 sec. [Worker #2 Jul 4 12:54] PolyH built. Time: 36.737 sec. [Worker #2 Jul 4 12:54] M675347 stage 2 at B2=100165065 [33.33%]. Time: 0.000 sec. [Worker #2 Jul 4 12:56] PolyG built. Time: 67.937 sec. [Worker #2 Jul 4 12:56] PolyH built. Time: 35.953 sec. [Worker #2 Jul 4 12:56] M675347 stage 2 at B2=138690825 [49.99%]. Time: 0.000 sec. [Worker #2 Jul 4 12:57] PolyG built. Time: 68.429 sec. [Worker #2 Jul 4 12:58] PolyH built. Time: 37.144 sec. [Worker #2 Jul 4 12:58] M675347 stage 2 at B2=177216585 [66.66%]. Time: 0.000 sec. [Worker #2 Jul 4 12:59] PolyG built. Time: 67.681 sec. [Worker #2 Jul 4 13:00] PolyH built. Time: 36.372 sec. [Worker #2 Jul 4 13:00] M675347 stage 2 at B2=215742345 [83.33%]. Time: 0.000 sec. [Worker #2 Jul 4 13:01] PolyG built. Time: 69.471 sec. [Worker #2 Jul 4 13:01] PolyH built. Time: 36.607 sec. [Worker #2 Jul 4 13:02] H(X) scaled. Time: 9.851 sec. [Worker #2 Jul 4 13:02] PolyF up. Time: 9.244 sec. [Worker #2 Jul 4 13:02] PolyF down. Time: 21.425 sec. [Worker #2 Jul 4 13:02] PolyF up. Time: 26.766 sec. [Worker #2 Jul 4 13:02] PolyF down. Time: 41.547 sec. [Worker #2 Jul 4 13:03] gg = mul H(X). Time: 2.540 sec. [Worker #2 Jul 4 13:03] Stage 2 complete. 711687 transforms, 6 modular inverses. Total time: 702.843 sec. [Worker #2 Jul 4 13:03] Stage 2 GCD complete. Time: 0.240 sec. The resulting B2 is about eight times higher than before. Additionally, stage 2 is about 80−100 % faster now!
 2022-07-04, 20:35 #15 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 2·11·61 Posts At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine. Last fiddled with by kruoli on 2022-07-04 at 20:37 Reason: Grammar.
2022-07-04, 22:27   #16
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

23·1,021 Posts

Quote:
 Originally Posted by kruoli At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.

2022-07-05, 01:42   #17
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11111111010002 Posts

Quote:
 Originally Posted by kruoli At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.
I'm not having any luck replicating this.

 2022-07-05, 17:30 #18 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 24768 Posts Weird. I was not able to replicate it today, either. Hopefully nothing to worry about. Last fiddled with by kruoli on 2022-07-05 at 18:23 Reason: Spelling.
 2022-07-12, 03:53 #19 masser     Jul 2003 Behind BB 111101101002 Posts I know this isn't the preferred use case, but I got this segfault tonight running a P-1 with 30.9. See the attached screenshot. Here's the worktodo.txt assignment line: Pminus1=1,2,32599673,-1,180000,0,68,"782392153,45051795327956903,166007139391952858287" Attached Thumbnails   Last fiddled with by masser on 2022-07-12 at 03:54
2022-07-13, 10:00   #20
lycorn

"GIMFS"
Sep 2002
Oeiras, Portugal

157110 Posts

Quote:
 Originally Posted by Prime95 Interestingly, in my limited testing it seems that, unlike P-1, B2 > ~1000 * B1 does not make sense. You are better off running more curves or increasing B1. In some situations, prime95 elects not to use all the available memory -- this will mean the default MaxHighMemWorkers setting might need to change.
I started doing some ECM work using 30.9. If I understood correctly, only stage 2 has benefited from the improvements implemented. So I didn´t quite understand why you wrote there´s no point in using B2 > ~1000B1.
I am running some tests on very little exponents, as I remember from some work done years ago they were the ones most favoured by GMP-ECM. For M4567, the ECM Progress page indicates ~20k curves already run for B1 = 110,000,000; I started with B1=150,000,000 and B2 ~1000B1, but the running time of Stage 2 was ridiculously small compared to stage 1´s.
After some adjustments I settled with B1 = 110,000,000 and B2 = 6e13. On a 4-core i5-7400 @ 3.3 GHz, with 27.5 GB available to Prime95, I am getting 16 minutes and 12 minutes for Stage 1 and Stage 2 respectively. This seems to be in line with some recommendations about the ideal ratio for running times I recall from previous work (Stage 2 taking ~70% of Stage 1 running time).
So far the software has been pretty stable, multithreading appears to be working fine. As for Prime95 using or not the available memory, I found that depends solely on the B2 used. With B2 = 1.5e12, stage 2 used ~50% of the allowed memory, with B2 = 3e13 ~70%, and with the current B2 = 6e13 it is using 26.3 GB.
Any idea/recommendation/suggestion for exponent size and/or bounds?

2022-07-13, 14:52   #21
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

23×1,021 Posts

Quote:
 Originally Posted by lycorn I didn´t quite understand why you wrote there´s no point in using B2 > ~1000B1. Any idea/recommendation/suggestion for exponent size and/or bounds?
If you use B2=100*B1, prime95 should select a close-to-optimal B2 (if there aren't any bugs)

The best B2 value is determined by shortest time to complete the "t" level you are working on.
Say, you are working on t70. Run gmp-ecm to get the number of curves required for the three different B2 values you tried (I think it is the -v switch). Compute prime95-runtime * number of curves required. The B2 with the smallest total runtime is the winner.

Please report back your findings. My 1000*B1 guesstimate was based on the rapidly diminishing returns for larger and larger B2 multipliers.

 2022-07-14, 14:34 #22 lycorn     "GIMFS" Sep 2002 Oeiras, Portugal 110001000112 Posts I did some more tests with 30.9. First of all, the conditions of the test: Exponent: 4567 FFT Size: 256 bytes Hardware: i5-7400 (4 cores, no HT) @ 3.3GHz. Available memory (DDR4-3200): 27.5 GB. Number of workers: 1 (4 cores allowed). B1 bound: 110M (t55). Average runtime for Stage 1: 950 sec (just under 16 minutes). B2 bounds: several large bounds (1.5e13, 3e13, 6e13), then 1e11 (1000 * B1), and some smaller bounds down to 105 * B2. Finally I tried B2 = 100 * B1 to see what P95 would choose. For each of the B2 bounds I used GMP-ECM to get the expected number of curves to complete a t55. The results were pretty much in line with the latest post of George in this thread, in that large bounds have diminishing returns in terms of the time taken to complete the t55. Now what I found a bit weird was that as I kept lowering the B2 values the times still got better and better, even for B2 as low as 105 * B1. It would seem that "the lower the better". For this particular value I got a Stage 2 runtime of 5.3 sec, nearly 180 times less than Stage 1´s, and a memory utilization of only 738 MB. When I used B2 = 100 * B1, so P95 would choose the optimal B2, I got a value of 28,217 * B1, approximately 3.1e12, and the results, in terms of time to complete a t55, were much worse than with all the lower B2 values tried. So my point is: does this really work this way? I mean, is the time to complete the t55, as given by the product of number of curves * B2 runtime, the only criterion to take in to account when choosing an optimal B2 value? Granted the time each curve takes goes down when we lower B2, but the chance of finding a factor is certainly lower as well, so it seems some sort of compromise should be reached. And when we look at the value chosen by P95 (much larger than values that were yielding lower completion times), we tend to think that must be the case. I admit I´m a bit confused with the results. Particularly, it´s a bit difficult to swallow that a run taking 950 sec for Stage 1 and 5.3 sec for Stage 2, and using just 738 MB of memory, was the "best" of all. Next thing is multithreading. When I did the first set of tests, the program would report, for stage 2, that 3 polymult helper threads were assigned to cores 2, 3 and 4. That seemed fine. Then, using P95 recommended B2, I tried to fire up 2 workers, 2 threads each, as the maximum memory used would confortably fit in the 27.5 GB available. On stage 2, each worker was assigned one helper, as expected. Now the time taken to run stage 2 was just 20-25% more than with just one worker, that would get 3 helper threads. Is that the expected, or isn´t the program taking enough advantage of more helper threads during stage 2, meaning that some multithreading "tweaking" would be a plus? Any comments/guidance would be appreciated. I´ll be happy to run more tests if required.

 Similar Threads Thread Thread Starter Forum Replies Last Post jinydu Lounge 9 2006-11-10 00:14 Prime95 Software 139 2005-03-30 12:13 Citrix Prime Sierpinski Project 15 2004-08-22 16:43 GP2 Lounge 55 2003-11-21 21:08 ET_ Lounge 3 2003-10-11 16:52

All times are UTC. The time now is 06:59.

Wed Feb 8 06:59:17 UTC 2023 up 174 days, 4:27, 1 user, load averages: 0.33, 0.52, 0.64

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔