![]() |
![]() |
#23 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3,691 Posts |
![]()
The optimal number of cores/worker depends on fft size. Very small fft size may be optimal with a single core/worker. Very large fft size may be optimal with all cores available even in high-core-count systems. The general rule is to default at 4 cores/worker, but that is for DC & first test wavefront size ffts (currently ~3-6M fft size).
Last fiddled with by kriesel on 2022-07-14 at 16:01 |
![]() |
![]() |
![]() |
#24 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
Yes, that is certainly the case for ECM Stage 1 on FFTs this small, so P95 runs single threaded. For stage 2, the program uses 3 helpers although the FFT size is probably about the same size, I think it has to do with the polynomial multiplication.
|
![]() |
![]() |
![]() |
#25 | |
"Curtis"
Feb 2005
Riverside, CA
130148 Posts |
![]() Quote:
However, the timing curve is quite broad near the peak, so I choose the largest B1/B2 that are within, say, 5% of the best expected time for a T-level, to maximize the amount of work done for T60 while I'm doing the T55. To me, it makes sense to give up a bit of efficiency on smaller factors to gain a larger chance to find a bigger factor. For instance, B1 = 6e7 is faster to run a T50 than 43e6 when using GMP-ECM with default B2 values, and also improve the chance to find 52+ digit factors when compared to B1=43e6. I wonder if this is true with the new P95 as well. |
|
![]() |
![]() |
![]() |
#26 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
Yes, that is right; I just didn´t make myself clear enough: I meant to say that for a given value of B1, in this case 110M, just reducing the value of B2 didn´t seem to be a valid approach. I gave the extreme example of B1=110M and B2 = 105 * B1 yielding the lowest time to complete t55, whereas it didn´t seem a sensible move to use such a small value for B2. In fact, Prime95 itself chose a larger value for B2 even though the time to complete t55 was larger than using B2 = 105 * B1, as described in my post.
Last fiddled with by lycorn on 2022-07-14 at 21:43 |
![]() |
![]() |
![]() |
#27 |
Einyen
Dec 2003
Denmark
D6F16 Posts |
![]()
I was starting some timing tests as well, just running on 1 core with 24GB RAM, but I got several SUMOUT errors during stage 1:
ECM2=1,2,2267,-1,800000000,80000000000,1 It did finish stage 1 at least 1 time so far with 1 SUMOUT error, now 3 SUMOUT errors so far in curve #2. I have these in prime.txt since I just copied my normal file: SumInputsErrorCheck=1 OutputRoundoff=1 I'm trying now to force FFT 160 instead of 128 and see if that helps. Edit: It seems 128 FFT is too large for M2267, and 96 FFT is too small. Trying M2719 at 128 FFT instead. Last fiddled with by ATH on 2022-07-15 at 03:13 |
![]() |
![]() |
![]() |
#28 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
23·1,021 Posts |
![]() Quote:
Example: 110M, 105*B1, 950+5.3, 42000 I'm surprised at your preliminary results. Sounds like prime95's optimal B2 guess needs work. Quote:
|
||
![]() |
![]() |
![]() |
#29 |
"GIMFS"
Sep 2002
Oeiras, Portugal
1,571 Posts |
![]()
Summary of results:
The number of curves to run was given by GMP-ECM. The B1 runtime is an average value. Runtimes in seconds. Tests done using 1 worker with 4 physical cores allowed to run Prime95. Exponent: 4567 B1 B2 runtime curves to run 110 M 1000 * B1 950 + 17.9 25849 110 M 500 * B1 950 + 11.9 29306 110 M 200 * B1 950 + 7.1 35419 110 M 105 * B1 950 + 5.3 40485 110 M 100 * B1 950 + 110.7 14396 (actual B2 = 28217 *B1, computed by Prime95) For larger values of B2, stage 2 runtime would grow accordingly: 110 M 1.5e13 950 + 293.4 11285 110 M 3.0e13 950 + 500.7 10211 110 M 6.0e13 950 + 793 9307 |
![]() |
![]() |
![]() |
#30 |
"GIMFS"
Sep 2002
Oeiras, Portugal
30438 Posts |
![]()
Additionally, the amount of stage 2 memory used (in MB) for the different B2 values was:
105 * B1 ---- 738 200 * B1 ---- 949 500 * B1 ---- 1159 1000 * B1 ----1778 100 * B1 ---- 9813 (actual B2 chosen by Prime95 = 28217 * B1) 1.5e13 ---- 18498 3.0e13 ---- 18498 (yes, it was the same value) 6.0e13 ---- 26359 |
![]() |
![]() |
![]() |
#31 |
Einyen
Dec 2003
Denmark
19×181 Posts |
![]()
M2719 had many SUMOUT errors as well, so I set:
SumInputsErrorCheck=0 and now it seems to run fine. Does that mean the SUMOUT errors are just hidden now but still there or are they false? |
![]() |
![]() |
![]() |
#32 |
P90 years forever!
Aug 2002
Yeehaw, FL
23·1,021 Posts |
![]()
Just hidden. SUMOUT checks only available in SSE2 FFTs (old computer?). SUMOUT checks were the first error checks prime95 used. They are "fuzzy". Two floating point check values are supposed to be equal, but since floats are inexact prime95 checks the two values are "really close" to equal. You probably have some outliers that were just beyond "really close".
|
![]() |
![]() |
![]() |
#33 | |
Einyen
Dec 2003
Denmark
19·181 Posts |
![]() Quote:
I will just continue with SumInputsErrorCheck=0 and hope it is just "almost really close" values. I'm not trying to find factors anyway, just testing the stage 2 speed for different values of B2. Last fiddled with by ATH on 2022-07-15 at 17:30 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
That's a Lot of Users!!! | jinydu | Lounge | 9 | 2006-11-10 00:14 |
Beta version 24.6 - Athlon users wanted | Prime95 | Software | 139 | 2005-03-30 12:13 |
For Old Users | Citrix | Prime Sierpinski Project | 15 | 2004-08-22 16:43 |
Opportunity! Retaining new users post-M40 | GP2 | Lounge | 55 | 2003-11-21 21:08 |
AMD USERS | ET_ | Lounge | 3 | 2003-10-11 16:52 |