![]() |
![]() |
#12 | |
P90 years forever!
Aug 2002
Yeehaw, FL
177368 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#13 |
"Glenn Leider"
Apr 2021
Carlsbad, CA
33 Posts |
![]()
Okay, will continue to use the latest release of 30.8 for now. Thanks for the quick response.
|
![]() |
![]() |
![]() |
#14 |
"Oliver"
Sep 2017
Porta Westfalica, DE
101001010012 Posts |
![]()
Some example of one of my machines (Intel Atom):
Code:
[Worker #2 Jul 4 12:50] Stage 1 complete. 1910782 transforms, 1 modular inverses. Total time: 1122.019 sec. [Worker #2 Jul 4 12:50] Available memory is 4092MB. [Worker #2 Jul 4 12:50] Optimal B2 is 1017*B1 = 254250000. Actual B2 will be 254268105. [Worker #2 Jul 4 12:50] Estimated stage 2 vs. stage 1 runtime ratio: 0.357 [Worker #2 Jul 4 12:50] Setting affinity to run helper thread 1 on CPU core #4 [Worker #2 Jul 4 12:50] Using 3932MB of memory. D: 19110, degree-2016 polynomials. Ftree polys in memory: 2 [Worker #2 Jul 4 12:50] Setting affinity to run polymult helper thread on CPU core #4 [Worker #2 Jul 4 12:52] Stage 2 init complete. 123530 transforms, 1 modular inverses. Time: 120.117 sec. [Worker #2 Jul 4 12:53] PolyG built. Time: 67.360 sec. [Worker #2 Jul 4 12:53] M675347 stage 2 at B2=61639305 [16.66%]. Time: 0.000 sec. [Worker #2 Jul 4 12:54] PolyG built. Time: 67.692 sec. [Worker #2 Jul 4 12:54] PolyH built. Time: 36.737 sec. [Worker #2 Jul 4 12:54] M675347 stage 2 at B2=100165065 [33.33%]. Time: 0.000 sec. [Worker #2 Jul 4 12:56] PolyG built. Time: 67.937 sec. [Worker #2 Jul 4 12:56] PolyH built. Time: 35.953 sec. [Worker #2 Jul 4 12:56] M675347 stage 2 at B2=138690825 [49.99%]. Time: 0.000 sec. [Worker #2 Jul 4 12:57] PolyG built. Time: 68.429 sec. [Worker #2 Jul 4 12:58] PolyH built. Time: 37.144 sec. [Worker #2 Jul 4 12:58] M675347 stage 2 at B2=177216585 [66.66%]. Time: 0.000 sec. [Worker #2 Jul 4 12:59] PolyG built. Time: 67.681 sec. [Worker #2 Jul 4 13:00] PolyH built. Time: 36.372 sec. [Worker #2 Jul 4 13:00] M675347 stage 2 at B2=215742345 [83.33%]. Time: 0.000 sec. [Worker #2 Jul 4 13:01] PolyG built. Time: 69.471 sec. [Worker #2 Jul 4 13:01] PolyH built. Time: 36.607 sec. [Worker #2 Jul 4 13:02] H(X) scaled. Time: 9.851 sec. [Worker #2 Jul 4 13:02] PolyF up. Time: 9.244 sec. [Worker #2 Jul 4 13:02] PolyF down. Time: 21.425 sec. [Worker #2 Jul 4 13:02] PolyF up. Time: 26.766 sec. [Worker #2 Jul 4 13:02] PolyF down. Time: 41.547 sec. [Worker #2 Jul 4 13:03] gg = mul H(X). Time: 2.540 sec. [Worker #2 Jul 4 13:03] Stage 2 complete. 711687 transforms, 6 modular inverses. Total time: 702.843 sec. [Worker #2 Jul 4 13:03] Stage 2 GCD complete. Time: 0.240 sec. |
![]() |
![]() |
![]() |
#15 |
"Oliver"
Sep 2017
Porta Westfalica, DE
1,321 Posts |
![]()
At least on this version, a benchmark on the default FFT 120K with 4 or 8 workers on an 11700KF hangs indefinitely. (Linux.) 1 or 2 workers run fine.
Last fiddled with by kruoli on 2022-07-04 at 20:37 Reason: Grammar. |
![]() |
![]() |
![]() |
#16 |
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,079 Posts |
![]() |
![]() |
![]() |
![]() |
#17 |
P90 years forever!
Aug 2002
Yeehaw, FL
11111110111102 Posts |
![]() |
![]() |
![]() |
![]() |
#18 |
"Oliver"
Sep 2017
Porta Westfalica, DE
1,321 Posts |
![]()
Weird. I was not able to replicate it today, either. Hopefully nothing to worry about.
Last fiddled with by kruoli on 2022-07-05 at 18:23 Reason: Spelling. |
![]() |
![]() |
![]() |
#19 |
Jul 2003
Behind BB
36578 Posts |
![]()
I know this isn't the preferred use case, but I got this segfault tonight running a P-1 with 30.9. See the attached screenshot.
Here's the worktodo.txt assignment line: Pminus1=1,2,32599673,-1,180000,0,68,"782392153,45051795327956903,166007139391952858287" Last fiddled with by masser on 2022-07-12 at 03:54 |
![]() |
![]() |
![]() |
#20 | |
"GIMFS"
Sep 2002
Oeiras, Portugal
2·5·157 Posts |
![]() Quote:
I am running some tests on very little exponents, as I remember from some work done years ago they were the ones most favoured by GMP-ECM. For M4567, the ECM Progress page indicates ~20k curves already run for B1 = 110,000,000; I started with B1=150,000,000 and B2 ~1000B1, but the running time of Stage 2 was ridiculously small compared to stage 1´s. After some adjustments I settled with B1 = 110,000,000 and B2 = 6e13. On a 4-core i5-7400 @ 3.3 GHz, with 27.5 GB available to Prime95, I am getting 16 minutes and 12 minutes for Stage 1 and Stage 2 respectively. This seems to be in line with some recommendations about the ideal ratio for running times I recall from previous work (Stage 2 taking ~70% of Stage 1 running time). So far the software has been pretty stable, multithreading appears to be working fine. As for Prime95 using or not the available memory, I found that depends solely on the B2 used. With B2 = 1.5e12, stage 2 used ~50% of the allowed memory, with B2 = 3e13 ~70%, and with the current B2 = 6e13 it is using 26.3 GB. Any idea/recommendation/suggestion for exponent size and/or bounds? |
|
![]() |
![]() |
![]() |
#21 | |
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,079 Posts |
![]() Quote:
The best B2 value is determined by shortest time to complete the "t" level you are working on. Say, you are working on t70. Run gmp-ecm to get the number of curves required for the three different B2 values you tried (I think it is the -v switch). Compute prime95-runtime * number of curves required. The B2 with the smallest total runtime is the winner. Please report back your findings. My 1000*B1 guesstimate was based on the rapidly diminishing returns for larger and larger B2 multipliers. |
|
![]() |
![]() |
![]() |
#22 |
"GIMFS"
Sep 2002
Oeiras, Portugal
2×5×157 Posts |
![]()
I did some more tests with 30.9. First of all, the conditions of the test:
Exponent: 4567 FFT Size: 256 bytes Hardware: i5-7400 (4 cores, no HT) @ 3.3GHz. Available memory (DDR4-3200): 27.5 GB. Number of workers: 1 (4 cores allowed). B1 bound: 110M (t55). Average runtime for Stage 1: 950 sec (just under 16 minutes). B2 bounds: several large bounds (1.5e13, 3e13, 6e13), then 1e11 (1000 * B1), and some smaller bounds down to 105 * B2. Finally I tried B2 = 100 * B1 to see what P95 would choose. For each of the B2 bounds I used GMP-ECM to get the expected number of curves to complete a t55. The results were pretty much in line with the latest post of George in this thread, in that large bounds have diminishing returns in terms of the time taken to complete the t55. Now what I found a bit weird was that as I kept lowering the B2 values the times still got better and better, even for B2 as low as 105 * B1. It would seem that "the lower the better". For this particular value I got a Stage 2 runtime of 5.3 sec, nearly 180 times less than Stage 1´s, and a memory utilization of only 738 MB. When I used B2 = 100 * B1, so P95 would choose the optimal B2, I got a value of 28,217 * B1, approximately 3.1e12, and the results, in terms of time to complete a t55, were much worse than with all the lower B2 values tried. So my point is: does this really work this way? I mean, is the time to complete the t55, as given by the product of number of curves * B2 runtime, the only criterion to take in to account when choosing an optimal B2 value? Granted the time each curve takes goes down when we lower B2, but the chance of finding a factor is certainly lower as well, so it seems some sort of compromise should be reached. And when we look at the value chosen by P95 (much larger than values that were yielding lower completion times), we tend to think that must be the case. I admit I´m a bit confused with the results. Particularly, it´s a bit difficult to swallow that a run taking 950 sec for Stage 1 and 5.3 sec for Stage 2, and using just 738 MB of memory, was the "best" of all. Next thing is multithreading. When I did the first set of tests, the program would report, for stage 2, that 3 polymult helper threads were assigned to cores 2, 3 and 4. That seemed fine. Then, using P95 recommended B2, I tried to fire up 2 workers, 2 threads each, as the maximum memory used would confortably fit in the 27.5 GB available. On stage 2, each worker was assigned one helper, as expected. Now the time taken to run stage 2 was just 20-25% more than with just one worker, that would get 3 helper threads. Is that the expected, or isn´t the program taking enough advantage of more helper threads during stage 2, meaning that some multithreading "tweaking" would be a plus? Any comments/guidance would be appreciated. I´ll be happy to run more tests if required. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
That's a Lot of Users!!! | jinydu | Lounge | 9 | 2006-11-10 00:14 |
Beta version 24.6 - Athlon users wanted | Prime95 | Software | 139 | 2005-03-30 12:13 |
For Old Users | Citrix | Prime Sierpinski Project | 15 | 2004-08-22 16:43 |
Opportunity! Retaining new users post-M40 | GP2 | Lounge | 55 | 2003-11-21 21:08 |
AMD USERS | ET_ | Lounge | 3 | 2003-10-11 16:52 |