20040919, 13:59  #23 
Aug 2002
Termonfeckin, IE
2768_{10} Posts 
One last question: For these numbers GMPECM's default B2 is 2.5577e10 whereas mprime/Prime95 are of course limited to 4e9. I have started running these numbers with B2=4e10.
Can anyone tell me if it is better to stick with the default bounds? Or should I go for the maximum 1.8e11? And how will these curves be reported to George since one curve with B2=4e10 is obviously worth more than a curve with B2=4e9? 
20040919, 14:34  #24  
Aug 2002
Termonfeckin, IE
2^{4}×173 Posts 
Three posts in a row!
A couple of more questions. Is the 5.1beta for Athlon that is posted on the primenumbers yahoogroups site the fastest or is there a faster build available? Alex mentioned above: Quote:
Quote:
So I'm assuming that the steps outlines two posts above are correct and sufficient. 

20040919, 18:15  #25  
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
Quote:
Quote:
Here are some (approximate) expected number of curves for B1=11M: B2=1.1e9: 11000 B2=5e9: 8200 B2=10e9: 7100 B2=20e9: 6200 B2=40e9: 5500 If you do stage 2 with B2=40G, each curve has an about twice as high chance of finding a factor as B2=1.1e9, so when reporting your effort, simply double the number of curves. Quote:
I'm not 100% certain that reading unreduced residues from a save file works correctly. In case of base2 reduction, I expect is does, which is what will typically be used when resuming Prime95 residues. If, however, the remaining cofactor is only ~70% of the full 2^n+1 number, or for Aurifeullian factors, MODMULN will be used which may have problems. I did a test just now and it worked, but I haven't looked at the code hard enough yet to be sure it always will. If you can arrange it at all, I'd recommend using a patched 5.0.3 for stage 2. I can't offer a Windows binary, though, lacking Windows. Alex EDIT: Duh. Of course the residue gets properly reduced when it's converted to a mpres_t type. No patch neccessary, but I'll attach it anyways. Last fiddled with by akruppa on 20040919 at 18:22 

20040919, 21:49  #26 
Aug 2002
Termonfeckin, IE
2^{4}×173 Posts 
Thanks Alex for your detailed post. I used 5.1beta because that was the only version for which I could find a compiled Athlon binary. I do not have the appropriate tools for compiling on my Athlon so I took the easy way out.
I guess I will download them now. :sigh: I have since found an Athlon compiled version for ecm5.0c which I assume is the same as 5.0.3. For 2^10331, stage 2 it was significantly slower than 5.1beta. The reason it seems is that it chooses k=5 instead of k=4 with 5.1beta. As a result the B2' value is higher and the d and dF values are different as well. The slowdown was about 16% so it cannot be accounted for by the higher B2' only. I also used the test posted on the 5.0.3 page and it worked fine so if that checks for the MULMODN bug, 5.1beta does not have it. On a 2.8 P4, stage 1 on M4132 with mprime takes 440 seconds. I am running four separate stage 2s on an Athlon 1333 for 1033, 1033+,2066L and 2066M and they take from 635sec to 825 secs depending on the size of the cofactor. So, I should certainly reduce the B2 bounds. I will use the table that you posted above and the time it takes for B2 to these limits to figure out what is the optimal B2 bound. It is likely to be less than even the default. Also, for both 1033 and 1033+ GMPECM recognized that the residues cofactors were for these numbers and used the special mults. Your edit confirms my suspicion that the residues are indeed reduced. 
20040920, 18:04  #27 
Aug 2002
Termonfeckin, IE
101011010000_{2} Posts 
Ok! So I finally downloaded and compiled ECM 5.0.3 with GMP4.1.3. I ran it on the P4 and found that k=3 or k=4 was usually optimal and that ecm does not always choose the k that is best. But it is usually reasonably close.
I ran GMPECM for various values of B2. I found that 40e9 is indeed very high and that the best performance according to the curves required table given by Alex a few posts above is actually 1.1e9 or B2=100*B1! I give the running times below for p1033 but since the cofactors are similar in size the numbers or at least the trends are representative. Code:
B2=1.1e9: 11000 *33s = 343k secs B2= 5e9: 8200 *101s = 828k secs B2= 10e9: 7100 * 152s = 1079k secs B2= 20e9: 6200 * 257s = 1593k secs B2= 40e9: 5500 * 496s = 2728k secs Code:
B2=1.1e9: 11000 *33+110s = 1210k secs B2= 5e9: 8200 *101+110s = 1730k secs B2= 10e9: 7100 * 152+110s = 1860k secs B2= 20e9: 6200 * 257+110s = 2275k secs B2= 40e9: 5500 * 496+110s = 3333k secs I'm just posting this so I can get a sanity check on these numbers from Geoff or Alex. Is it okay for B2=100*B1 to be the optimal bound? I thought GMPECM was more efficient for stage 2 but I guess the difference is not so much to increase the bounds for such small numbers. Oh yeah, BTW I ran mprime for stage 2 on M4132 with the same bounds 1.1e9 and it took 192 seconds whereas all four GMPECM stage 2's took a sum of 142 seconds. So it still is worth doing separate stage2's on the numbers using GMPECM. 
20040920, 18:33  #28  
"Bob Silverman"
Nov 2003
North of Boston
2^{2}×1,877 Posts 
Quote:
mean? What are you optimizing? Won't the optimal k depend on the step 2 limit chosen? {I assume here that k is the number of partitions for step 2; one partitions the step 2 interval into pieces} If anyone would bother to read my joint paper with Sam Wagstaff Jr. they would find out that in order to "maximize the probability of success per unit time", then you should spend the same amount of time in step 2 as in step 1. Adjust your limits accordingly. Before one can "optimize", one must specify what is being optimized. Theoretically, with an "idealized FFT implementation of step 2", then one could take step 2 to (step 1)^2 in an amount of time equal to that spent in step 1. This is a theoretically best bound by FFT's. GMPECM does indeed use better algorithms for step 2 than that used by Mprime. Reading Peter Montgomery's dissertation would also be useful.. He did a superb job. Last fiddled with by smh on 20040920 at 20:35 Reason: Fixed quotes 

20040920, 19:04  #29 
"Nancy"
Aug 2002
Alexandria
2467_{10} Posts 
I'm a little surprised, too, I'd have expected that gmpecm makes larger B2 worthwhile.
But then again, Prime95 is really awfully efficient on a P4, whereas allinteger GMP is not. M4132 makes excellent use of Prime95's length 192 transform (threshold 4355) and at a cofactor size of ~250 digits, GMP really starts lagging behind Prime95's DWT. For comparison: on an Athlon XP 3000+, stage 1 with mprime takes 730s, stage 2 on the c255 of P1033 with gmpecm 5.0.3 takes 26.5s. The ratio is more than twice of what you see on the P4. If everything were done on an Athlon, B2=5e9 or 10e9 would be optimal. If you can do stage 2 on an Athlon, a slightly larger B2 may become worthwhile. Otherwise just stick with B2=1.1e9, at least it saves a little time wrt Prime95. Alex 
20040920, 21:06  #30 
Aug 2002
Termonfeckin, IE
2768_{10} Posts 
Bob, you are correct. it is bad form to say optimal without saying what I was optimizing against. Well, I was running gmpecm 5.0.3 on a P4 in linux and for the same B2 bound I was running stage 2 and using different k values (yes, number of big steps) and observing how time taken to run varies as k varies.
The times I posted above should be taken with a slight pinch of salt, since for different k values, the B2' and b2 values chosen are different. Here b2 is the size of the step and B2' is simply k*b2 such that it exceeds the specified B2. So yes, k does depend on the limit chosen for stage 2 and what's more k affects the "actual" limit for stage 2 as well. Regardless, the difference in times for similar B2' values that result from different k values chosen are significant enough to say that the value of k can affect the efficiency of step 2. I did skim through Montgomery's dissertation and have downloaded your paper though I have not had time to read it yet. I will likely do that this weekend. The rule of spending the same time in stage 1 and stage 2 is something I did not know of. Alex had mentioned previously that the amount of time spent in stage 2 should be half of time spent in stage 1. But I'm assuming that since he was saying this in the thread linked to by geoff where geoff was doing stage 1 using mprime on M1574 and then stage 2 using gmpecm on P787 and M787 separately, what he meant was that the time stage 2 for either P787 or M787 should be half that of the total time spent in stage 1 on M1574. I would be inclined to follow your advice except that I am running stage 1 on a P4 2.8 with mprime. And stage 2 on an Athlon 1333 with gmpecm. So it is a little hard to match the timings. Stage 1 on mprime runs approximately 3 times faster on the P4 whereas stage 2 on gmpecm is only 50% faster. As Alex also indicates in his post, that probably muddles things up a bit. Before I have time to sit down and read your paper and PM's dissertation, I will probably go ahead and run a few curves such that the time spent in stages 1 and 2 is approximately the same after taking into account the difference in the efficiency of mprime and gmpecm for P4 and Athlon. At this point it seems that this is at B2=5e9 or so. 
20040921, 10:44  #31 
Aug 2002
Termonfeckin, IE
2^{4}·173 Posts 
A question
I just started using Richard McIntosh's maple program to calculate the number of curves required. With p=45.5 (# of digits) and B1=11e6 I got the following number of curves:
B2=11e8: 10200 B2=22e8: 8657 B2=26e9: 5227 (GMPECM default) B2=40e9: 4830 These numbers are less than what Alex posted and this presumably is not even taking into account the BrentSuyama extension. I also cross checked with the numbers on http://www.mersenneforum.org/showthread.php?t=2326 and found that the maple program is underestimating from what has been posted there as well. However, if the BrentSuyama extension figures posted by Alex are taken into account the maple program is overestimating. Any clue as to why this is happening and what should I trust? Last fiddled with by garo on 20040921 at 10:47 
20040921, 12:40  #32  
"Bob Silverman"
Nov 2003
North of Boston
2^{2}·1,877 Posts 
Quote:
Overestimating.. By how much??? If you read my paper with Sam Wagstaff Jr., you will discover that the response surface [probability of success with given limits and multiple curves] is VERY shallow in the neighborhood of the optimum. You are trying to estimate number of curves with much too fine a precision. I would want multiprecise versions of Dickman's function before trying to estimate #curves with the degree of precision you (seemingly) want. The response surface is just too flat in the region of the optimum. Furthermore, do you know how #curves gets computed? It is 1/(per curve probability). Now look at 1/(per curve probability  epsilon). #curves is quite sensitive to small changes in the estimated probability. Furthermore, trying to compute #curves for 45.5 digits is just silly in my opinion. I presume that you mean sqrt(10)* 10^44 for 45.5 digits? I would use the GMP default. It does a good job balancing step 1/step 2 times. Note also the probability depends on whether curves are chosen with group order a priori divisible by 12, or whether curves are chosen so the order is divisible by 8 with prob=.5 and 16 with prob = .5 

20040921, 12:57  #33 
Aug 2002
Termonfeckin, IE
2^{4}·173 Posts 
Hi Bob,
Yes I do know how the # of curves is calculated. I looked through the maple code. Thanks for pointing out that I shouldn't try to estimate the number of curves with too fine a degree of precision. The maple program output is approximately 5% lower than the the estimates posted by geoff and 5% higher than the estimates posted by Alex post BrentSuyama extension. Also, if you read the post by philmoore where he posted the maple program you will find that it explains the reason for choosing 45.5 digits. It is indeed a way to estimate the probability of finding a factor of size sqrt(10).10^44 I'll hold my tongue (or rather my fingers) till I read your paper and have some time to think about it. But in the meantime, please try not to treat me just like those jokers who keep coming up with superefficient new ways to prove primality 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Nice progress!  schickel  FactorDB  29  20120718 17:03 
Nice pic  Dubslow  Forum Feedback  0  20120502 02:13 
Let's do another nice big GNFS job!  fivemack  Factoring  84  20110426 10:22 
M971 factored  philmoore  Lounge  5  20040914 20:18 
Nice link...  Xyzzy  Lounge  4  20030628 13:37 