Let's Optimize P-1 for low exponents. TL;DR in post #1. More in posts 60 and 61.
2022-01-15, 15:28   #45
nordi

Dec 2016

2·59 Posts

Quote:
 Originally Posted by firejuggler A question I have : is it worthy to rerun a P-1 with the same bound if you find a factor in stage 1? (adding the factor found, obviously)
You mean in order to potentially find another factor in stage 2? Yes, it is.

 2022-01-15, 17:57 #46 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 120748 Posts Some Bounds Testing With 8-core i7-7820x PC 20.8M exponents at TF75 Assignments look like: Pminus1=N/A,1,2,20858423,-1,800000,0,75 Test: Alter RAM allocated B2-Mult is from this line: With trial factoring done to 2^75, optimal B2 is 327*B1 = 261600000. Interestingly the Mult% (Ratio) is close to the RAM Ratio As RAM dropped, B2-Mult dropped and hence Pct. (Chance of a New Factor) dropped, but run time remained the same. Code: RAM B2-Mult Pct. Mult% 24 327 5.35% 16 227 5.08% 69.4% 12 173 4.87% 52.9% 8 124 4.63% 37.9% Then I used this to calculate the 2 following tables. Granted the numbers from this website do NOT exactly agree with the v30.8 numbers displayed but hopefully they are relative consistent enough to make the following reliable. What new B1 do I need given the same B2-Mult to get the same Pct.? (B1-M is Bound 1 in Millions) Same Pct. but GDs drops. Code: RAM B2-Mult B1-M B2-M Pct. GDs 24 327 0.8 261.60 5.41% 15.42 16 227 0.96 217.92 5.41% 13.13 12 173 1.1 190.30 5.40% 11.72 8 124 1.32 163.68 5.41% 10.45 What new B1 do I need given the same B2-Mult to get the same GhzDays? (I'm guessing this will be closer to the same clock-time) Same Gds but Pct. increases. Code: RAM B2-Mult B1-M B2-M Pct. GDs 24 327 0.8 261.60 5.41% 15.42 16 227 1.13 256.51 5.66% 15.46 12 173 1.45 250.85 5.83% 15.45 8 124 1.95 241.80 6.01% 15.43 Last fiddled with by petrw1 on 2022-01-15 at 18:21 Reason: Removed My Vote just added
2022-01-16, 02:20   #47
axn

Jun 2003

123578 Posts

Quote:
 Originally Posted by petrw1 What new B1 do I need given the same B2-Mult to get the same Pct.? (B1-M is Bound 1 in Millions) Same Pct. but GDs drops. Code: RAM B2-Mult B1-M B2-M Pct. GDs 24 327 0.8 261.60 5.41% 15.42 16 227 0.96 217.92 5.41% 13.13 12 173 1.1 190.30 5.40% 11.72 8 124 1.32 163.68 5.41% 10.45
This B1-M is very nearly fitting with sqrt(ref RAM/allocated RAM) * B1@ref RAM !!! Also, eventhough it says it is lower GHzD, I suspect that the actual runtimes are pretty close or even increasing, since those GD numbers are from pre-30.8

Last fiddled with by axn on 2022-01-16 at 02:21

2022-01-16, 04:03   #48
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006
Saskatchewan, Canada

22×5×7×37 Posts

Quote:
 Originally Posted by axn This B1-M is very nearly fitting with sqrt(ref RAM/allocated RAM) * B1@ref RAM !!! Also, eventhough it says it is lower GHzD, I suspect that the actual runtimes are pretty close or even increasing, since those GD numbers are from pre-30.8
With the same PC I increased B1 from .8M to 1.1M and decreased RAM from 24GB to 12GB.
I had a similar success rate 5.35% vs 5.41%.
The run time was longer: Stage 1 took about 15% longer as expected.
Stage 2 did have a lower Bound2 181 vs 327 but with half as much RAM it took close to the same time as the prior run with 24GB of RAM.

2022-01-16, 05:49   #49
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006
Saskatchewan, Canada

22·5·7·37 Posts

Quote:
 Originally Posted by Prime95 Please verify that 2.2 number, it was just a rough guess.
i7-7820x 24GB RAM
20.8M .8M/261.6M(B2=327xB1) - Stage1: 10 Min / Stage2: 9 Min = 19 Min Total - 5.35% / 17.6777 GhzDays
10.4M 1.76M/1169M(B2=664xB1) - Stage 1: 22 Min / Stage2: 13 Min = 35 Min Total - 6.74% / 36.5 GhzDays
I ran both tests with TF=75 bits rather than the actual TF level of 74 for the 10.4M. Should I have?
For exponent 50% smaller, 2.2x B1 seems to be too much; especially for Stage1 run times.

10.4M 75Bits 1.2M/786M (B2=655xB1) - Stage 1: 11 Min / Stage 2: 9 Min = 20 Min Total - 6.10% / 24.5581 GhzDays
10.4M 72Bits 1.2M/846M (B2=705xB1) - Stage 1: 11 Min / Stage 2: 9.5 Min = 20.5 Min Total - 7.89% / 26.3972 GhzDays
1.5X seems a good fit at least for this PC and for 20.8M vs 10.4M

 2022-01-16, 09:44 #50
2022-01-16, 10:26   #51
axn

Jun 2003

535910 Posts

Quote:
 Originally Posted by petrw1 i7-7820x 24GB RAM 20.8M .8M/261.6M(B2=327xB1) - Stage1: 10 Min / Stage2: 9 Min = 19 Min Total - 5.35% / 17.6777 GhzDays 10.4M 1.76M/1169M(B2=664xB1) - Stage 1: 22 Min / Stage2: 13 Min = 35 Min Total - 6.74% / 36.5 GhzDays I ran both tests with TF=75 bits rather than the actual TF level of 74 for the 10.4M. Should I have? For exponent 50% smaller, 2.2x B1 seems to be too much; especially for Stage1 run times.
This doesn't make sense. What were the stage 1 FFT sizes, per iteration timings and worker/thread configuration for these two runs?

While 2.2 might be a bit high, 2x or 1.9x _should have_ given comparable timings (twice as many iterations, half the per-iteration-time). Yet, you're off by 2.2x - it is as if the smaller FFT wasn't any faster at all. Does. Not. Make. Sense.

Last fiddled with by axn on 2022-01-16 at 10:27

2022-01-17, 04:13   #52
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006
Saskatchewan, Canada

22×5×7×37 Posts

Quote:
 Originally Posted by axn This doesn't make sense. What were the stage 1 FFT sizes, per iteration timings and worker/thread configuration for these two runs? While 2.2 might be a bit high, 2x or 1.9x _should have_ given comparable timings (twice as many iterations, half the per-iteration-time). Yet, you're off by 2.2x - it is as if the smaller FFT wasn't any faster at all. Does. Not. Make. Sense.
I agree it seems odd; It could be a quirk in my PC.
Even with version 29 of Prime95 I got the best P-1 thruput with 8 Cores/ 1 Worker.

Exp / B1 : FFT1 / FFT2 : Stage1 / Stage2
20.8M / .8M : 1152K / 1280K : 9 Min / 9 Min
10.4M / 1.2M : 560K / 640K : 15 Min / 9 Min --- Not sure why Stage1 is slow here
5.2M / 1.8M : 280K / 320K : 8 Min / 7 Min --- But the times seem better here
Yes I'm thinking 1.5x is too low.

Anyone else want to run a few tests.
We are trying to determine how much to increase B1 when the exponent halves to get the same run time.
We think it is about 2x.

Last fiddled with by petrw1 on 2022-01-17 at 04:15

 2022-01-17, 05:37 #53
firejuggler

I can give you timing for my working range
3core/1 worker 10 Gb of mem
8.5M/1.56M: 448k/ 512k : 1550 sec/1000 sec
2022-01-17, 08:24   #54
Luminescence

Oct 2021
Germany

32·11 Posts

Quote:
 Originally Posted by petrw1 I agree it seems odd; It could be a quirk in my PC. Even with version 29 of Prime95 I got the best P-1 thruput with 8 Cores/ 1 Worker. Exp / B1 : FFT1 / FFT2 : Stage1 / Stage2 20.8M / .8M : 1152K / 1280K : 9 Min / 9 Min 10.4M / 1.2M : 560K / 640K : 15 Min / 9 Min --- Not sure why Stage1 is slow here
I have noticed the same thing with smaller FFTs sometimes being slower. That’s not a quirk of your PC, but seemingly a quirk of AVX-512 FFTs.

I noticed that some FFTs are faster when using one worker on a 18 core CPU (Xeon W-2295), but slow down when using two or three workers. But FFTs that were slower to beginn with (one worker) do not slow down when using multiple workers.

2022-01-17, 16:33   #55
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006
Saskatchewan, Canada

10100001111002 Posts

Quote:
 Originally Posted by firejuggler I can give you timing for my working range 3core/1 worker 10 Gb of mem 8.5M/1.56M: 448k/ 512k : 1550 sec/1000 sec
Thanks
Now could you try one or both of these to see if the run times are about the same.
- an exponent half the size with double the B1
- an exponent double the size with half the B1

