mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-07-04, 08:22   #12
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

795010 Posts
Default

Quote:
Originally Posted by Glenn View Post
I had a single round off error over 0.4 on a recent test for
M113782777, but successfully completed that test anyway.
Since then my tests (two more so far) include a “round off:”
column in my results for each group of 10,000 iterations.

Please let me know when it’s safe to use 30.9.
The single roundoff error above 0.4 is completely normal. Continue to use 30.8.
Prime95 is online now   Reply With Quote
Old 2022-07-04, 09:29   #13
Glenn
 
"Glenn Leider"
Apr 2021
Carlsbad, CA

1B16 Posts
Default

Okay, will continue to use the latest release of 30.8 for now. Thanks for the quick response.
Glenn is offline   Reply With Quote
Old 2022-07-04, 11:17   #14
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

1,087 Posts
Default

Some example of one of my machines (Intel Atom):
Code:
[Worker #2 Jul 4 12:50] Stage 1 complete. 1910782 transforms, 1 modular inverses. Total time: 1122.019 sec.
[Worker #2 Jul 4 12:50] Available memory is 4092MB.
[Worker #2 Jul 4 12:50] Optimal B2 is 1017*B1 = 254250000.  Actual B2 will be 254268105.
[Worker #2 Jul 4 12:50] Estimated stage 2 vs. stage 1 runtime ratio: 0.357
[Worker #2 Jul 4 12:50] Setting affinity to run helper thread 1 on CPU core #4
[Worker #2 Jul 4 12:50] Using 3932MB of memory.  D: 19110, degree-2016 polynomials.  Ftree polys in memory: 2
[Worker #2 Jul 4 12:50] Setting affinity to run polymult helper thread on CPU core #4
[Worker #2 Jul 4 12:52] Stage 2 init complete. 123530 transforms, 1 modular inverses. Time: 120.117 sec.
[Worker #2 Jul 4 12:53] PolyG built.  Time: 67.360 sec.
[Worker #2 Jul 4 12:53] M675347 stage 2 at B2=61639305 [16.66%].  Time: 0.000 sec.
[Worker #2 Jul 4 12:54] PolyG built.  Time: 67.692 sec.
[Worker #2 Jul 4 12:54] PolyH built.  Time: 36.737 sec.
[Worker #2 Jul 4 12:54] M675347 stage 2 at B2=100165065 [33.33%].  Time: 0.000 sec.
[Worker #2 Jul 4 12:56] PolyG built.  Time: 67.937 sec.
[Worker #2 Jul 4 12:56] PolyH built.  Time: 35.953 sec.
[Worker #2 Jul 4 12:56] M675347 stage 2 at B2=138690825 [49.99%].  Time: 0.000 sec.
[Worker #2 Jul 4 12:57] PolyG built.  Time: 68.429 sec.
[Worker #2 Jul 4 12:58] PolyH built.  Time: 37.144 sec.
[Worker #2 Jul 4 12:58] M675347 stage 2 at B2=177216585 [66.66%].  Time: 0.000 sec.
[Worker #2 Jul 4 12:59] PolyG built.  Time: 67.681 sec.
[Worker #2 Jul 4 13:00] PolyH built.  Time: 36.372 sec.
[Worker #2 Jul 4 13:00] M675347 stage 2 at B2=215742345 [83.33%].  Time: 0.000 sec.
[Worker #2 Jul 4 13:01] PolyG built.  Time: 69.471 sec.
[Worker #2 Jul 4 13:01] PolyH built.  Time: 36.607 sec.
[Worker #2 Jul 4 13:02] H(X) scaled.  Time: 9.851 sec.
[Worker #2 Jul 4 13:02] PolyF up.  Time: 9.244 sec.
[Worker #2 Jul 4 13:02] PolyF down.  Time: 21.425 sec.
[Worker #2 Jul 4 13:02] PolyF up.  Time: 26.766 sec.
[Worker #2 Jul 4 13:02] PolyF down.  Time: 41.547 sec.
[Worker #2 Jul 4 13:03] gg = mul H(X).  Time: 2.540 sec.
[Worker #2 Jul 4 13:03] Stage 2 complete. 711687 transforms, 6 modular inverses. Total time: 702.843 sec.
[Worker #2 Jul 4 13:03] Stage 2 GCD complete. Time: 0.240 sec.
The resulting B2 is about eight times higher than before. Additionally, stage 2 is about 80−100 % faster now!
kruoli is offline   Reply With Quote
Old 2022-07-04, 20:35   #15
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

1,087 Posts
Default

At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.

Last fiddled with by kruoli on 2022-07-04 at 20:37 Reason: Grammar.
kruoli is offline   Reply With Quote
Old 2022-07-04, 22:27   #16
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×52×53 Posts
Default

Quote:
Originally Posted by kruoli View Post
At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.
Try adding SpinWait=1 in prime.txt
Prime95 is online now   Reply With Quote
Old 2022-07-05, 01:42   #17
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×52×53 Posts
Default

Quote:
Originally Posted by kruoli View Post
At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.
I'm not having any luck replicating this.
Prime95 is online now   Reply With Quote
Old 2022-07-05, 17:30   #18
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

1,087 Posts
Default

Weird. I was not able to replicate it today, either. Hopefully nothing to worry about.

Last fiddled with by kruoli on 2022-07-05 at 18:23 Reason: Spelling.
kruoli is offline   Reply With Quote
Old 2022-07-12, 03:53   #19
masser
 
masser's Avatar
 
Jul 2003
Behind BB

27×3×5 Posts
Default

I know this isn't the preferred use case, but I got this segfault tonight running a P-1 with 30.9. See the attached screenshot.

Here's the worktodo.txt assignment line:

Pminus1=1,2,32599673,-1,180000,0,68,"782392153,45051795327956903,166007139391952858287"
Attached Thumbnails
Click image for larger version

Name:	Screen Shot 2022-07-11 at 9.35.38 PM.png
Views:	45
Size:	35.1 KB
ID:	27099  

Last fiddled with by masser on 2022-07-12 at 03:54
masser is online now   Reply With Quote
Old 2022-07-13, 10:00   #20
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

30268 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Interestingly, in my limited testing it seems that, unlike P-1, B2 > ~1000 * B1 does not make sense. You are better off running more curves or increasing B1. In some situations, prime95 elects not to use all the available memory -- this will mean the default MaxHighMemWorkers setting might need to change.
I started doing some ECM work using 30.9. If I understood correctly, only stage 2 has benefited from the improvements implemented. So I didn´t quite understand why you wrote there´s no point in using B2 > ~1000B1.
I am running some tests on very little exponents, as I remember from some work done years ago they were the ones most favoured by GMP-ECM. For M4567, the ECM Progress page indicates ~20k curves already run for B1 = 110,000,000; I started with B1=150,000,000 and B2 ~1000B1, but the running time of Stage 2 was ridiculously small compared to stage 1´s.
After some adjustments I settled with B1 = 110,000,000 and B2 = 6e13. On a 4-core i5-7400 @ 3.3 GHz, with 27.5 GB available to Prime95, I am getting 16 minutes and 12 minutes for Stage 1 and Stage 2 respectively. This seems to be in line with some recommendations about the ideal ratio for running times I recall from previous work (Stage 2 taking ~70% of Stage 1 running time).
So far the software has been pretty stable, multithreading appears to be working fine. As for Prime95 using or not the available memory, I found that depends solely on the B2 used. With B2 = 1.5e12, stage 2 used ~50% of the allowed memory, with B2 = 3e13 ~70%, and with the current B2 = 6e13 it is using 26.3 GB.
Any idea/recommendation/suggestion for exponent size and/or bounds?
lycorn is offline   Reply With Quote
Old 2022-07-13, 14:52   #21
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

795010 Posts
Default

Quote:
Originally Posted by lycorn View Post
I didn´t quite understand why you wrote there´s no point in using B2 > ~1000B1.
Any idea/recommendation/suggestion for exponent size and/or bounds?
If you use B2=100*B1, prime95 should select a close-to-optimal B2 (if there aren't any bugs)

The best B2 value is determined by shortest time to complete the "t" level you are working on.
Say, you are working on t70. Run gmp-ecm to get the number of curves required for the three different B2 values you tried (I think it is the -v switch). Compute prime95-runtime * number of curves required. The B2 with the smallest total runtime is the winner.

Please report back your findings. My 1000*B1 guesstimate was based on the rapidly diminishing returns for larger and larger B2 multipliers.
Prime95 is online now   Reply With Quote
Old 2022-07-14, 14:34   #22
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2×19×41 Posts
Default

I did some more tests with 30.9. First of all, the conditions of the test:

Exponent: 4567
FFT Size: 256 bytes
Hardware: i5-7400 (4 cores, no HT) @ 3.3GHz.
Available memory (DDR4-3200): 27.5 GB.
Number of workers: 1 (4 cores allowed).
B1 bound: 110M (t55). Average runtime for Stage 1: 950 sec (just under 16 minutes).
B2 bounds: several large bounds (1.5e13, 3e13, 6e13), then 1e11 (1000 * B1), and some smaller bounds down to 105 * B2. Finally I tried B2 = 100 * B1 to see what P95 would choose.

For each of the B2 bounds I used GMP-ECM to get the expected number of curves to complete a t55.

The results were pretty much in line with the latest post of George in this thread, in that large bounds have diminishing returns in terms of the time taken to complete the t55. Now what I found a bit weird was that as I kept lowering the B2 values the times still got better and better, even for B2 as low as 105 * B1. It would seem that "the lower the better". For this particular value I got a Stage 2 runtime of 5.3 sec, nearly 180 times less than Stage 1´s, and a memory utilization of only 738 MB. When I used B2 = 100 * B1, so P95 would choose the optimal B2, I got a value of 28,217 * B1, approximately 3.1e12, and the results, in terms of time to complete a t55, were much worse than with all the lower B2 values tried.
So my point is: does this really work this way? I mean, is the time to complete the t55, as given by the product of number of curves * B2 runtime, the only criterion to take in to account when choosing an optimal B2 value? Granted the time each curve takes goes down when we lower B2, but the chance of finding a factor is certainly lower as well, so it seems some sort of compromise should be reached. And when we look at the value chosen by P95 (much larger than values that were yielding lower completion times), we tend to think that must be the case.
I admit I´m a bit confused with the results. Particularly, it´s a bit difficult to swallow that a run taking 950 sec for Stage 1 and 5.3 sec for Stage 2, and using just 738 MB of memory, was the "best" of all.

Next thing is multithreading. When I did the first set of tests, the program would report, for stage 2, that 3 polymult helper threads were assigned to cores 2, 3 and 4. That seemed fine.
Then, using P95 recommended B2, I tried to fire up 2 workers, 2 threads each, as the maximum memory used would confortably fit in the 27.5 GB available. On stage 2, each worker was assigned one helper, as expected. Now the time taken to run stage 2 was just 20-25% more than with just one worker, that would get 3 helper threads. Is that the expected, or isn´t the program taking enough advantage of more helper threads during stage 2, meaning that some multithreading "tweaking" would be a plus?

Any comments/guidance would be appreciated. I´ll be happy to run more tests if required.
lycorn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
That's a Lot of Users!!! jinydu Lounge 9 2006-11-10 00:14
Is P.I.E.S. still closed to some users? jasong Information & Answers 9 2005-10-23 19:04
Beta version 24.6 - Athlon users wanted Prime95 Software 139 2005-03-30 12:13
For Old Users Citrix Prime Sierpinski Project 15 2004-08-22 16:43
AMD USERS ET_ Lounge 3 2003-10-11 16:52

All times are UTC. The time now is 01:26.


Sun Aug 14 01:26:25 UTC 2022 up 37 days, 20:13, 2 users, load averages: 0.79, 0.94, 0.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔