mersenneforum.org Default tests_saved value for FTC PRP and standalone P-1
 Register FAQ Search Today's Posts Mark Forums Read

 2021-10-31, 16:51 #1 techn1ciaN     Oct 2021 U.S. / Maine 22×3×11 Posts Default tests_saved value for FTC PRP and standalone P-1 I've become curious on why PRP (for FTCs with no P-1 done) and P-1 lines from PrimeNet still have the P-1 tests_saved parameter set to 2. Surely with GEC + proof as the only current FTC option, the amount of found P-1 factors that would actually save two full-length primality tests has to be in the hundredths of a percent if not even lower. Is it stupid of me to then think that P-1 might actually be a net loss of cycles at present, if the amount of power to spend on it is being computed with the assumption that more power could be saved on the other end than actually could? This question would be purely an interest exercise if the people running standalone P-1 were easily keeping up with the FTC wavefront, but this does not seem to be the case because the grand majority of FTCs in cat 2 or above go out with no P-1 done. So, it seems to me that any reasonable way of increasing P-1 throughput would be worthwhile, especially since it is more likely to be done well by someone who runs it standalone than by an FTC user who only runs it incidentally. I was surprised to find almost no existing discussion on this beyond a few people mentioning in passing that they change the parameter to 1 manually. (This is unfortunately not practical for me because I am not always at my computer frequently enough and do not know enough programming to write a script that could do it.)
2021-11-01, 03:31   #2
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×13×47 Posts

Quote:
 Originally Posted by techn1ciaN I was surprised to find almost no existing discussion on this
It does appear to be time to change the server to giving out assignments with tests-saved value = 1 or 1.048.
That would handle prime95/mprime, but not gpuowl which needs explicit bounds given.

I picked ~107M to be representative, and checked for PRP and for LL results.

107M-107.1M PRP at 10/31/2021 ~2045 UTC
37 unverified PRP,
1150 verified PRP,
The unverified range from 2019-12-13 to today; about half are more than a week old.
Almost all the verified were by proof, one by DC.

107M-107.1M LL at 10/31/2021 ~2051 UTC
8 verified results (4 exponents & DC) ranging 2016-02-21 to 2021-04-09
29 unverified ranging 2016-03-05 to 2021-05-09

So the ~107M mix is 1148 PRP proof, 1 PRP DC, 37 PRP tbd, probably 18 more PRP DC & 19 PRP proof;
33 LL & DC eventually; & probably 1 LL TC;
projected 19 PRP DC & 33 LL DC = 52 DC vs 1148+19= 1167 PRP proof; DC is 52/(52+1167) = 4.27% DC.

Computing a little more carefully, including approx proof generation and certification time,
PRP w/o proof, 20 * 2 = 40
LL & DC & TC etc, 33 * 2.0408 = 67.35
PRP with proof 1167 * ~1.005 = 1172.84
sum of effort = 1278.19; on 20+33+1167 = 1220exponents; tests saved equivalent / exponent ~ 1.0477

# saved ~ 1.0477 on average

2021-11-04, 13:13   #3
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17DE16 Posts
Approx 1 test saved bounds appears optimal

I ran a comparison timing with prime95 of 1 versus 2 or 3 saved wavefront P-1 assignments, on
i7-1165g7, 16GiB single DIMM ram, prime95v30.6b4 on Windows 10 x64 Home 21H1
PFactor=aid,1,2,107181343,-1,76,1
PFactor=aid,1,2,107181367,-1,76,2
PFactor=aid,1,2,107191871,-1,76,3
(since the exponents only differ by ~0.01%, the run times for same # of tests saved would probably differ by only ~0.021%)

PRP=aid,1,2,107184463,-1,76,2 program estimated runtime 15d 9:45

bounds for 1 test saved 403000, 15491000, determined by prime95
bounds for 2 tests saved 851000, 37318000 " "
bounds for 3 tests saved 1284000, 61709000 " "
runtime for 1 saved est 6:09; 10/31/21 15:16 start s1, 16:52 s1 gcd done, 20:17 s2 & gcd done, actual 5:01 elapsed, 7.2853 GHD, NF
runtime for 2 saved est 13:59; 10/31/21 20:17 start s1, 23:44 s1 gcd done, 10:06 s2 & gcd done, actual 13:49 elapsed, 16.8766 GHD, NF (13.817 hours)
runtime for 3 saved est 23:30; 11/1/21 10:06 start s1, 15:37 s1 gcd done, 11/2 12:07 s2 & gcd done, actual 26:01 elapsed, 27.2143 GHD
odds of factor found for 1 test saved bounds, 3.36%
odds of factor found for 2 tests saved bounds, 4.37%
odds of factor found for 3 tests saved bounds, 5.00%
Run time on a 107M PRP (M107184463) is estimated as 15d 9:45, or 369.75 hours on the same system.
Adjusting that by p^2.1 to 107181367, PRP run time is estimated as 369.73 hours.

1 test saved bounds cost 5 hr 1 min, 3.36% chance of saving ~15d 9:44: 5.017 -369.73*.0336 = -7.406 hours (7.4 hours likely saved; this is the case for an LL DC, or PRP DC without proof)
1.005 test actually saved (1 PRP/GEC/proof&cert), 5.017-369.73*.0336*1.005 = -7.468 hours
2 tests actually saved (2 PRP/GEC without proof), 5.017-369.73*.0336*2 = -19.829 hours
2.0408 test actually saved (2 LL & occasionally a third etc), 5.017-369.73*.0336*2.0408 = -20.336 hours.
2 test saved bounds 13:49, 4.37% 369.73 hr; 13.817 -.0437 369.73 n = -2.341 hours if only 1 actually saved, -18.498 if 2.
skipping P-1, 0. That's the worst case P-1 choice, barring extremely high bounds that cost more than they save.

We want the lowest number possible in the following (greatest amount of time saved by doing P-1, largest negative magnitude)
At 1.0477 saved average:
1-saved bounds 5.017-369.73*.0336*1.0477 = -7.999 hours (saves 2.15% of a PRP/GEC/proof&cert)
2-saved bounds 13.817-369.73*.0437*1.0477 = -3.111 hours (0.84% of a PRP/GEC/proof&cert)
ratio 7.999/3.111 = 2.568
Difference =4.888 hours, which is ~1.32% of the estimated PRP run time, 1-test-saved faster.
If this holds similarly for other hardware too, we could speed up the GIMPS project progress by ~1.32%*(369.73/(369.73+13.817)) ~ 1.27% by switching the server from issuing 2-tests-saved P-1 assignments to 1-test-saved. (probably in actuality a little less because of gpuowl's bounds selection.)
3-saved bounds 26.017-369.73*.050*1.0477 = +6.649 hours (costs several hours per exponent on average)
Running excessively high P-1 bounds can be worse than skipping P-1 entirely.

All the above is for computation of P-1 that s not integrated with PRP computation for the same exponent, such as in prime95/mprime, CUDAPm1, Mlucas, or Gpuowl 6.x. Gpuowl 7.x uses powers of 3 computed during the initial portion of PRP est computation to help obtain the power needed for P-1 stage 1, reducing combined cost. Since that cuts P-1 stage 1 cost to about 10% that of the independent computation, it would reduce the total additional computation cost for P-1 considerably, favoring larger bounds especially for stage 1.

Is there any reason to believe that this result is not representative for other exponents or other processor types? I don't know of any. Perhaps the authors of P-1 and/or PRP capable software know of some.
Attached Files
 P-1 time differential for i7-1165g7.pdf (17.0 KB, 32 views)

Last fiddled with by kriesel on 2021-11-04 at 13:46

2021-11-04, 18:19   #4
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

52·311 Posts

Quote:
 Originally Posted by techn1ciaN I've become curious on why PRP (for FTCs with no P-1 done) and P-1 lines from PrimeNet still have the P-1 tests_saved parameter set to 2.
I agree that setting tests_saved to 1 would be optimal for GIMPS throughput.

However, is a factor worth more than a PRP proof? That's a purely subjective question. Mihai and I agree that a factor is more valuable. A factor can prove a Mersenne number composite in milliseconds. A proof and cert cannot be re-verified in the future, especially since proof files are discarded due to their large size. To a future researcher, our current proof methods amount to "trust us, we proved number x composite".

2021-11-04, 20:36   #5
techn1ciaN

Oct 2021
U.S. / Maine

22·3·11 Posts

Quote:
 Originally Posted by Prime95 Mihai and I agree that a factor is more valuable.
I absolutely agree myself, but I question how relevant this is to the tests_saved value.

GIMPS has a dedicated contingent of people factoring already-tested exponents, using TF deeper than even GPU72 would have given out and P-1 bounds greater than what even tests_saved=2 would have set. I see no reason to believe that any large amount of them would lose interest, and their effort only gets cheaper as hardware evolves, so it should almost surely reach the current FTC wavefront and beyond given enough time.

I think it is also necessary to consider the practical angle. Good factoring hardware looks different from good FTC hardware, let alone FTC users' actual hardware, and again, standalone P-1 testing is not keeping up with the FTC wavefront. So, an attempt to do better P-1 will certainly get better results within whatever throughput standalone P-1 users can manage, but may still get worse results overall as high-cat FTCs' P-1 is run at the mercy of whatever the PRP tester has available (if they even bothered to change the default resource limits). You do not have to look very hard to find a recently completed exponent with PRP and P-1 done by the same user and poor P-1 bounds or no P-1 stage 2 at all.

The short version of my viewpoint is that an attempt to both save PRP tests and find factors for factors' sake may end up doing neither well.

Last fiddled with by techn1ciaN on 2021-11-04 at 20:41 Reason: Sentence flow

2021-11-04, 20:57   #6
Uncwilly
6809 > 6502

"""""""""""""""""""
Aug 2003
101×103 Posts

101000000001102 Posts

Quote:
 Originally Posted by kriesel I ran a comparison timing with prime95 of 1 versus 2 or 3 saved wavefront P-1 assignments, on
Why did you not try 1.1 tests saved? That does take a little bit more effort than a 1.0 value, but not much. And it leans a little in favor of finding a factor. From my testing anything in the hundredths column gets ignored. So the gradations are 1.0, 1.1, 1.2 etc.

2021-11-05, 00:17   #7
techn1ciaN

Oct 2021
U.S. / Maine

22·3·11 Posts

Quote:
 Originally Posted by Prime95 To a future researcher, our current proof methods amount to "trust us, we proved number x composite".
Another point about the increasing cheapness of computational power is that today's difficult check becomes tomorrow's easy verification. An exponent that takes a week to test with a current high-end processor might be verifiable (via the "hard way" of running the full test again) 20 years from now in an hour or two on a midrange laptop.

I have unfortunately lost the specific post but can recall seeing Madpoo say that he recently personally verified the LL for every exponent below 5 million.

2021-11-05, 01:12   #8
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·5·13·47 Posts

Quote:
 Originally Posted by Uncwilly Why did you not try 1.1 tests saved? That does take a little bit more effort than a 1.0 value, but not much. And it leans a little in favor of finding a factor. From my testing anything in the hundredths column gets ignored. So the gradations are 1.0, 1.1, 1.2 etc.
I started out testing 1 and 2, then added 3 which took more than another day. A "quick" way of getting a coarse read on the tradeoffs. At that point I wanted to post for information and discussion. The zero point was free and added last. 1.1 and 0.9 for symmetry could be informative. Another day. It might become a reference post eventually. Did you find that it treats 1.01 the same as 1.0, or that it truncates or rounds any .0x? I suppose I could wander over to the source code again.
For anyone still using CUDAPm1 (and I am a little yet), I think it's limited to integer values of tests-saved, from 1 to 9. (CUDAPm1 parse.c source fragment:)
Code:
    else if (count_numerical_field == 6) {
if ( proposed_exponent > 0 ) assignment->ll_saved = (int)proposed_exponent;
prime95v30.6b4 source module ecm.c caps P-1 tests_saved at 10, and has it typed as double.
There's an implicit assumption handling ERROR_RATE that two primality tests will occur, not PRP & proof gen & cert:
Code:
/* Balance P-1 against 1 or 2 LL/PRP tests (actually more since we get a */
/* corrupt result reported some of the time). */

g.ll_testing_cost = (tests_saved + 2 * ERROR_RATE) * n;
instead of tests_saved * (1 + ERROR_RATE)* n
There's rounding of B1 and B2 upward to the next 1000, in the costing code, so suffiiciently small changes in tests_saved would produce the same bounds:
Code:
/* Round up B1 and B2 to nearest 1000 -- just to look pretty */

best[1].B1 = round_up_to_multiple_of (best[1].B1, 1000);
best[1].B2 = round_up_to_multiple_of (best[1].B2, 1000);
Output appears to support hundredths in tests_saved:
Code:
    sprintf (buf, "Assuming no factors below 2^%.2g and %.2g primality test%s saved if a factor is found.\n",
w->sieve_depth, w->tests_saved, w->tests_saved == 1.0 ? "" : "s");
That's as far as I went.

Last fiddled with by kriesel on 2021-11-05 at 01:45

 2021-11-05, 01:18 #9 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 101000000001102 Posts IIRC it truncates. 1.15 was treated as 1.1
2021-11-05, 03:59   #10
axn

Jun 2003

2×32×293 Posts

Quote:
 Originally Posted by Prime95 However, is a factor worth more than a PRP proof? That's a purely subjective question. Mihai and I agree that a factor is more valuable. A factor can prove a Mersenne number composite in milliseconds. A proof and cert cannot be re-verified in the future, especially since proof files are discarded due to their large size. To a future researcher, our current proof methods amount to "trust us, we proved number x composite".
This is a poor argument. It provides no framework to decide what should be the actual "tests saved" to be used. I mean, one could easily ask: "Why stop at 2? Why not 3? Why not 10? Why not eleventy billion?!!!!".

I can suggest an actual framework to reason about this. Leave the tests saved as 1 (or 1.1 or whatever based on modelling the CERT overhead + abandoned tests etc.). Compute optimal B1/B2 per usual. Calculate the savings (in number of squarings). Keep going to higher B1s and calculate the savings. Stop when savings falls below (say) 90% of that of the optimal B1/B2. Use this higher B1/B2 for actual work. This gives us increased factor probability without compromising gimps thruput "too much". Here the 90% value can be something else - but atleast it gives a way to more systematically reason about the cost/savings rather than pulling some indirect "2 tests saved" out of nowhere.

PS:- Does P95 implement PRP+Stage 1? If not, is that in pipeline? I feel like that would give substantial reduction to P-1 cost. In fact, it might even make sense to do Stage-1-only P-1 with that (assuming sufficient RAM is available).

Last fiddled with by axn on 2021-11-05 at 04:00

2021-11-05, 06:13   #11
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

52·311 Posts

Quote:
 Originally Posted by axn This is a poor argument.
I cannot disagree. I only offer this as an explanation as to why I've not been in any hurry to adjust the 2 downward.

Quote:
 Does P95 implement PRP+Stage 1? If not, is that in pipeline? I feel like that would give substantial reduction to P-1 cost. In fact, it might even make sense to do Stage-1-only P-1 with that (assuming sufficient RAM is available).
It is not in the works. There are two reasons:
1) Prime95 would require a lot of memory during the PRP+Stage1 part of the PRP test. Thus, not an option that can be turned on in a default install. A prime95 default install is not supposed to interfere with normal work.
2) There is some overhead involved in PRP+Stage1. I need to revisit the process to quantify -- it's been a long while since I last looked at it. IIRC, if you have 128 temporaries you get a theoretical 7x increase in stage 1 performance -- I don't recall what I expected the overhead to knock that 7x down to.

That said, it is worth implementing. The last year I've been working on needed gwnum library improvements as well as improving P-1 (and P+1/ECM) stage 2 in preparation for implementing this. 30.7 is close to a finished product, but Pavel and I are working on yet another stage 2 improvement!

 Similar Threads Thread Thread Starter Forum Replies Last Post piforbreakfast PrimeNet 14 2021-03-24 20:54 piforbreakfast Information & Answers 2 2021-03-08 13:30 ant Software 9 2016-07-27 16:45 lycorn PrimeNet 9 2015-01-09 16:32 schickel Forum Feedback 15 2009-04-05 14:50

All times are UTC. The time now is 02:55.

Mon Jan 24 02:55:05 UTC 2022 up 184 days, 21:24, 0 users, load averages: 1.50, 1.35, 1.35