mersenneforum.org Nice Result! M971
 Register FAQ Search Today's Posts Mark Forums Read

 2004-09-19, 13:59 #23 garo     Aug 2002 Termonfeckin, IE 276810 Posts One last question: For these numbers GMP-ECM's default B2 is 2.5577e10 whereas mprime/Prime95 are of course limited to 4e9. I have started running these numbers with B2=4e10. Can anyone tell me if it is better to stick with the default bounds? Or should I go for the maximum 1.8e11? And how will these curves be reported to George since one curve with B2=4e10 is obviously worth more than a curve with B2=4e9?
2004-09-19, 14:34   #24
garo

Aug 2002
Termonfeckin, IE

24×173 Posts

Three posts in a row!

A couple of more questions.

Is the 5.1-beta for Athlon that is posted on the primenumbers yahoogroups site the fastest or is there a faster build available?

Alex mentioned above:
Quote:
 If you do stage 1 on two (or more) numbers simultaneously, you should split the residue modulo the individual numbers before feeing it to gmp-ecm. It saves both time and memory. The thread http://www.mersenneforum.org/showthread.php?t=2326 has more info.
But in the other thread he mentions that:

Quote:
 You can do the stage 1 on M1574 and then stage 2 in GMP-ECM on M787 and P787 individually, but it's a little awkward. Produce the results.txt file with stage 1 residues as usual, then copy it to two files (i.e. results.m787 and results.p787). Get the remaining cofactors of M787 and P787 in decimal or hexadecimal form (GMP-ECM can parse either) and in a text editor, substitute in results.m787 the "N=0x....;" by "N=;", similarly for results.p787. GMP-ECM will then reduce the stage 1 residue modulo the cofactor and all should work out nice.
which implies that if the N values are properly broken down the residue is reduced appropriately by GMP-ECM. This makes sense as reducing the residue should be easy and a logical step for GMP-ECM to take every time.

So I'm assuming that the steps outlines two posts above are correct and sufficient.

2004-09-19, 18:15   #25
akruppa

"Nancy"
Aug 2002
Alexandria

2,467 Posts

Quote:
 2) GMP-ECM seems to be having trouble with N values expressed as hex regardless of what other people have posted here. When I converted the hex value into decimal for N, it worked fine. otherwise it complains about finding an "x" which comes from the "0x" before the hex value. Funnily, enough the residues can be expressed in hex without any problem whatsoever.
This is a problem of the "GMP-ECM5.1-beta" binary. It's more of a CVS snapshot than a beta and was made public without Paul Zimmermann's approval. Both Paul and I advise against using it. But since it's apparantly the only Windows binary of GMP-ECM floating around out there, many people use it anyways. Afaik, the MULMODN bug that was fixed in 5.0.3 is also present in "5.1-beta", beware.

Quote:
 One last question: For these numbers GMP-ECM's default B2 is 2.5577e10 whereas mprime/Prime95 are of course limited to 4e9. I have started running these numbers with B2=4e10.
The default B2 choice aims to keep the time spent in stage 2 at about half of that of stage 1. Since you use the much faster Prime95 for stage 1, you should use a lower B2. With B1=11M, 40G may even be a bit high.

Here are some (approximate) expected number of curves for B1=11M:
B2=1.1e9: 11000
B2=5e9: 8200
B2=10e9: 7100
B2=20e9: 6200
B2=40e9: 5500

If you do stage 2 with B2=40G, each curve has an about twice as high chance of finding a factor as B2=1.1e9, so when reporting your effort, simply double the number of curves.

Quote:
 which implies that if the N values are properly broken down the residue is reduced appropriately by GMP-ECM. This makes sense as reducing the residue should be easy and a logical step for GMP-ECM to take every time.
GMP-ECM does the mod reduction when it does arithmetic, but unfortunately not immeadiately when it reads residues from files. Sorry, that was an oversight on my part. I'm attaching a diff file that fixes that.

I'm not 100% certain that reading unreduced residues from a save file works correctly. In case of base-2 reduction, I expect is does, which is what will typically be used when resuming Prime95 residues. If, however, the remaining cofactor is only ~70% of the full 2^n+-1 number, or for Aurifeullian factors, MODMULN will be used which may have problems. I did a test just now and it worked, but I haven't looked at the code hard enough yet to be sure it always will.

If you can arrange it at all, I'd recommend using a patched 5.0.3 for stage 2. I can't offer a Windows binary, though, lacking Windows.

Alex

EDIT: Duh. Of course the residue gets properly reduced when it's converted to a mpres_t type. No patch neccessary, but I'll attach it anyways.
Attached Files
 resumediff.zip (392 Bytes, 173 views)

Last fiddled with by akruppa on 2004-09-19 at 18:22

 2004-09-19, 21:49 #26 garo     Aug 2002 Termonfeckin, IE 24×173 Posts Thanks Alex for your detailed post. I used 5.1-beta because that was the only version for which I could find a compiled Athlon binary. I do not have the appropriate tools for compiling on my Athlon so I took the easy way out. I guess I will download them now. :sigh: I have since found an Athlon compiled version for ecm-5.0c which I assume is the same as 5.0.3. For 2^1033-1, stage 2 it was significantly slower than 5.1-beta. The reason it seems is that it chooses k=5 instead of k=4 with 5.1-beta. As a result the B2' value is higher and the d and dF values are different as well. The slowdown was about 16% so it cannot be accounted for by the higher B2' only. I also used the test posted on the 5.0.3 page and it worked fine so if that checks for the MULMODN bug, 5.1-beta does not have it. On a 2.8 P4, stage 1 on M4132 with mprime takes 440 seconds. I am running four separate stage 2s on an Athlon 1333 for 1033-, 1033+,2066L and 2066M and they take from 635sec to 825 secs depending on the size of the cofactor. So, I should certainly reduce the B2 bounds. I will use the table that you posted above and the time it takes for B2 to these limits to figure out what is the optimal B2 bound. It is likely to be less than even the default. Also, for both 1033- and 1033+ GMP-ECM recognized that the residues cofactors were for these numbers and used the special mults. Your edit confirms my suspicion that the residues are indeed reduced.
 2004-09-20, 18:04 #27 garo     Aug 2002 Termonfeckin, IE 1010110100002 Posts Ok! So I finally downloaded and compiled ECM 5.0.3 with GMP4.1.3. I ran it on the P4 and found that k=3 or k=4 was usually optimal and that ecm does not always choose the k that is best. But it is usually reasonably close. I ran GMP-ECM for various values of B2. I found that 40e9 is indeed very high and that the best performance according to the curves required table given by Alex a few posts above is actually 1.1e9 or B2=100*B1! I give the running times below for p1033 but since the cofactors are similar in size the numbers or at least the trends are representative. Code: B2=1.1e9: 11000 *33s = 343k secs B2= 5e9: 8200 *101s = 828k secs B2= 10e9: 7100 * 152s = 1079k secs B2= 20e9: 6200 * 257s = 1593k secs B2= 40e9: 5500 * 496s = 2728k secs Ah! I was thinking that this obviously points to using 1.1e6 as the best solution but then I realized that I was not factoring in stage 1 times. Stage 1 on mprime takes 440 secs but that works for 4 numbers so it can be divided by 4 to give about 110 secs per curve. So the new running times are: Code: B2=1.1e9: 11000 *33+110s = 1210k secs B2= 5e9: 8200 *101+110s = 1730k secs B2= 10e9: 7100 * 152+110s = 1860k secs B2= 20e9: 6200 * 257+110s = 2275k secs B2= 40e9: 5500 * 496+110s = 3333k secs Hmm. So B2=1.1e9 is still the winner though not by as much. I guess this result holds for B1=11e6. For B2 =3.3e9: ??? *66+110 = say 1584k secs for 9000 curves. And B2=2.2e9 took 53+110 * say 10000 = 1630k secs I'm just posting this so I can get a sanity check on these numbers from Geoff or Alex. Is it okay for B2=100*B1 to be the optimal bound? I thought GMP-ECM was more efficient for stage 2 but I guess the difference is not so much to increase the bounds for such small numbers. Oh yeah, BTW I ran mprime for stage 2 on M4132 with the same bounds 1.1e9 and it took 192 seconds whereas all four GMP-ECM stage 2's took a sum of 142 seconds. So it still is worth doing separate stage2's on the numbers using GMP-ECM.
2004-09-20, 18:33   #28
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22×1,877 Posts

Quote:
 Originally Posted by garo Ok! So I finally downloaded and compiled ECM 5.0.3 with GMP4.1.3. I ran it on the P4 and found that k=3 or k=4 was usually optimal and that ecm does not always choose the k that is best. But it is usually reasonably close. I'm just posting this so I can get a sanity check on these numbers from Geoff or Alex. Is it okay for B2=100*B1 to be the optimal bound? I thought GMP-ECM was more efficient for stage 2 but I guess the difference is not so much to increase the bounds for such small numbers.
Query: When you say that "k= 3 or k = 4" was usually optimal, what do you
mean? What are you optimizing? Won't the optimal k depend on the step 2
limit chosen? {I assume here that k is the number of partitions for step 2;
one partitions the step 2 interval into pieces}

If anyone would bother to read my joint paper with Sam Wagstaff Jr. they
would find out that in order to "maximize the probability of success per unit time", then you should spend the same amount of time in step 2 as in step 1.

Before one can "optimize", one must specify what is being optimized.

Theoretically, with an "idealized FFT implementation of step 2", then one
could take step 2 to (step 1)^2 in an amount of time equal to that spent in
step 1. This is a theoretically best bound by FFT's.

GMP-ECM does indeed use better algorithms for step 2 than that used by Mprime.

Reading Peter Montgomery's dissertation would also be useful.. He did a
superb job.

Last fiddled with by smh on 2004-09-20 at 20:35 Reason: Fixed quotes

 2004-09-20, 19:04 #29 akruppa     "Nancy" Aug 2002 Alexandria 246710 Posts I'm a little surprised, too, I'd have expected that gmp-ecm makes larger B2 worthwhile. But then again, Prime95 is really awfully efficient on a P4, whereas all-integer GMP is not. M4132 makes excellent use of Prime95's length 192 transform (threshold 4355) and at a cofactor size of ~250 digits, GMP really starts lagging behind Prime95's DWT. For comparison: on an Athlon XP 3000+, stage 1 with mprime takes 730s, stage 2 on the c255 of P1033 with gmp-ecm 5.0.3 takes 26.5s. The ratio is more than twice of what you see on the P4. If everything were done on an Athlon, B2=5e9 or 10e9 would be optimal. If you can do stage 2 on an Athlon, a slightly larger B2 may become worthwhile. Otherwise just stick with B2=1.1e9, at least it saves a little time wrt Prime95. Alex
 2004-09-21, 10:44 #31 garo     Aug 2002 Termonfeckin, IE 24·173 Posts A question I just started using Richard McIntosh's maple program to calculate the number of curves required. With p=45.5 (# of digits) and B1=11e6 I got the following number of curves: B2=11e8: 10200 B2=22e8: 8657 B2=26e9: 5227 (GMP-ECM default) B2=40e9: 4830 These numbers are less than what Alex posted and this presumably is not even taking into account the Brent-Suyama extension. I also cross checked with the numbers on http://www.mersenneforum.org/showthread.php?t=2326 and found that the maple program is underestimating from what has been posted there as well. However, if the Brent-Suyama extension figures posted by Alex are taken into account the maple program is overestimating. Any clue as to why this is happening and what should I trust? Last fiddled with by garo on 2004-09-21 at 10:47
2004-09-21, 12:40   #32
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22·1,877 Posts

Quote:
 Originally Posted by garo I just started using Richard McIntosh's maple program to calculate the number of curves required. With p=45.5 (# of digits) and B1=11e6 I got the following number of curves: B2=11e8: 10200 B2=22e8: 8657 B2=26e9: 5227 (GMP-ECM default) B2=40e9: 4830 These numbers are less than what Alex posted and this presumably is not even taking into account the Brent-Suyama extension. I also cross checked with the numbers on http://www.mersenneforum.org/showthread.php?t=2326 and found that the maple program is underestimating from what has been posted there as well. However, if the Brent-Suyama extension figures posted by Alex are taken into account the maple program is overestimating. Any clue as to why this is happening and what should I trust?
Underestimating.
Overestimating..

By how much???

If you read my paper with Sam Wagstaff Jr., you will discover that the
response surface [probability of success with given limits and multiple curves]
is VERY shallow in the neighborhood of the optimum.

You are trying to estimate number of curves with much too fine a precision.
I would want multi-precise versions of Dickman's function before trying to
estimate #curves with the degree of precision you (seemingly) want.
The response surface is just too flat in the region of the optimum.

Furthermore, do you know how #curves gets computed? It is 1/(per curve probability). Now look at 1/(per curve probability - epsilon). #curves is
quite sensitive to small changes in the estimated probability.

Furthermore, trying to compute #curves for 45.5 digits is just silly in my
opinion. I presume that you mean sqrt(10)* 10^44 for 45.5 digits?

I would use the GMP default. It does a good job balancing step 1/step 2
times.

Note also the probability depends on whether curves are chosen with
group order a priori divisible by 12, or whether curves are chosen so the
order is divisible by 8 with prob=.5 and 16 with prob = .5

 2004-09-21, 12:57 #33 garo     Aug 2002 Termonfeckin, IE 24·173 Posts Hi Bob, Yes I do know how the # of curves is calculated. I looked through the maple code. Thanks for pointing out that I shouldn't try to estimate the number of curves with too fine a degree of precision. The maple program output is approximately 5% lower than the the estimates posted by geoff and 5% higher than the estimates posted by Alex post Brent-Suyama extension. Also, if you read the post by philmoore where he posted the maple program you will find that it explains the reason for choosing 45.5 digits. It is indeed a way to estimate the probability of finding a factor of size sqrt(10).10^44 I'll hold my tongue (or rather my fingers) till I read your paper and have some time to think about it. But in the meantime, please try not to treat me just like those jokers who keep coming up with super-efficient new ways to prove primality

 Similar Threads Thread Thread Starter Forum Replies Last Post schickel FactorDB 29 2012-07-18 17:03 Dubslow Forum Feedback 0 2012-05-02 02:13 fivemack Factoring 84 2011-04-26 10:22 philmoore Lounge 5 2004-09-14 20:18 Xyzzy Lounge 4 2003-06-28 13:37

All times are UTC. The time now is 10:15.

Sun Feb 5 10:15:08 UTC 2023 up 171 days, 7:43, 1 user, load averages: 0.75, 0.73, 0.72