Some Bounds Testing
With 8core i77820x PC
20.8M exponents at TF75 Assignments look like: Pminus1=N/A,1,2,20858423,1,800000,0,75 Test: Alter RAM allocated B2Mult is from this line: With trial factoring done to 2^75, optimal B2 is 327*B1 = 261600000. Interestingly the Mult% (Ratio) is close to the RAM Ratio As RAM dropped, B2Mult dropped and hence Pct. (Chance of a New Factor) dropped, but run time remained the same. Code:
RAM B2Mult Pct. Mult% 24 327 5.35% 16 227 5.08% 69.4% 12 173 4.87% 52.9% 8 124 4.63% 37.9% Granted the numbers from this website do NOT exactly agree with the v30.8 numbers displayed but hopefully they are relative consistent enough to make the following reliable. What new B1 do I need given the same B2Mult to get the same Pct.? (B1M is Bound 1 in Millions) Same Pct. but GDs drops. Code:
RAM B2Mult B1M B2M Pct. GDs 24 327 0.8 261.60 5.41% 15.42 16 227 0.96 217.92 5.41% 13.13 12 173 1.1 190.30 5.40% 11.72 8 124 1.32 163.68 5.41% 10.45 (I'm guessing this will be closer to the same clocktime) Same Gds but Pct. increases. Code:
RAM B2Mult B1M B2M Pct. GDs 24 327 0.8 261.60 5.41% 15.42 16 227 1.13 256.51 5.66% 15.46 12 173 1.45 250.85 5.83% 15.45 8 124 1.95 241.80 6.01% 15.43 Last fiddled with by petrw1 on 20220115 at 18:21 Reason: Removed My Vote just added 
Quote:
Quote:
I had a similar success rate 5.35% vs 5.41%. The run time was longer: Stage 1 took about 15% longer as expected. Stage 2 did have a lower Bound2 181 vs 327 but with half as much RAM it took close to the same time as the prior run with 24GB of RAM. 

i77820x 24GB RAM
20.8M .8M/261.6M(B2=327xB1)  Stage1: 10 Min / Stage2: 9 Min = 19 Min Total  5.35% / 17.6777 GhzDays 10.4M 1.76M/1169M(B2=664xB1)  Stage 1: 22 Min / Stage2: 13 Min = 35 Min Total  6.74% / 36.5 GhzDays I ran both tests with TF=75 bits rather than the actual TF level of 74 for the 10.4M. Should I have? For exponent 50% smaller, 2.2x B1 seems to be too much; especially for Stage1 run times. 10.4M 75Bits 1.2M/786M (B2=655xB1)  Stage 1: 11 Min / Stage 2: 9 Min = 20 Min Total  6.10% / 24.5581 GhzDays 10.4M 72Bits 1.2M/846M (B2=705xB1)  Stage 1: 11 Min / Stage 2: 9.5 Min = 20.5 Min Total  7.89% / 26.3972 GhzDays 1.5X seems a good fit at least for this PC and for 20.8M vs 10.4M 
Quote:
While 2.2 might be a bit high, 2x or 1.9x _should have_ given comparable timings (twice as many iterations, half the periterationtime). Yet, you're off by 2.2x  it is as if the smaller FFT wasn't any faster at all. Does. Not. Make. Sense. Last fiddled with by axn on 20220116 at 10:27 

Quote:
Even with version 29 of Prime95 I got the best P1 thruput with 8 Cores/ 1 Worker. Exp / B1 : FFT1 / FFT2 : Stage1 / Stage2 20.8M / .8M : 1152K / 1280K : 9 Min / 9 Min 10.4M / 1.2M : 560K / 640K : 15 Min / 9 Min  Not sure why Stage1 is slow here 5.2M / 1.8M : 280K / 320K : 8 Min / 7 Min  But the times seem better here Yes I'm thinking 1.5x is too low. Anyone else want to run a few tests. We are trying to determine how much to increase B1 when the exponent halves to get the same run time. We think it is about 2x. Last fiddled with by petrw1 on 20220117 at 04:15 

I can give you timing for my working range
3core/1 worker 10 Gb of mem 8.5M/1.56M: 448k/ 512k : 1550 sec/1000 sec Last fiddled with by firejuggler on 20220117 at 05:38 
Quote:
I noticed that some FFTs are faster when using one worker on a 18 core CPU (Xeon W2295), but slow down when using two or three workers. But FFTs that were slower to beginn with (one worker) do not slow down when using multiple workers. 

Quote:
Now could you try one or both of these to see if the run times are about the same.  an exponent half the size with double the B1  an exponent double the size with half the B1 

