mersenneforum.org ECM using gmp-ecm and mprime on a P4
 Register FAQ Search Today's Posts Mark Forums Read

 2004-04-05, 06:56 #1 geoff     Mar 2003 New Zealand 13×89 Posts ECM using gmp-ecm and mprime on a P4 (This post follows from the discussion in the Hardware forum, see http://www.mersenneforum.org/showthread.php?t=2307). These are the times I got for curves on the C204 of M787, the C169 of P787 and their product C373 of M1574 using the pre-compiled Debian gmp-ecm package (gmp-ecm 5.0.3): Code: Number | Stage 1 | Stage 2 | Total per | Peak | Time for | B1=43e6 | B2=180e9 | curve | memory | 9000 curves -------------+-----------+-----------+-----------+--------+------------- M787 (C204) | 2290 sec. | 1899 sec. | 4189 sec. | 188 MB | 436 days. P787 (C169) | 2112 sec. | 1380 sec. | 3492 sec. | 159 MB | 364 days. M1574 (C373) | 5855 sec. | 3798 sec. | 9653 sec. | 317 MB | 1006 days. For comparison here are the corresponding times using mprime 23.5: Code: Number | Stage 1 | Stage 2 | Total per | Peak | Time for | B1=44e6 | B2=4.29e9 | curve | memory | 19300 curves -------------+-----------+-----------+-----------+--------+-------------- M787 | 541 sec. | 246 sec. | 787 sec. | 50 MB | 176 days. P787 | 1354 sec. | 491 sec. | 1845 sec. | 50 MB | 412 days. M1574 | 684 sec. | 304 sec. | 988 sec. | 88 MB | 221 days. For this machine, running curves on M1574 with mprime gives more than twice the productivity of running separate curves on M787 and P787 with either program. But I am not sure how to do the comparison if mprime is used for stage 1 and gmp-ecm for stage 2. Is this even possible given that mprime is presumably working with the whole number while gmp-ecm works on the composite factor? If so what bounds should I use for the comparison?
 2004-04-05, 20:26 #2 philmoore     "Phil" Sep 2002 Tracktown, U.S.A. 25·5·7 Posts Thanks for the interesting data. I don't know the answer to your question and hope someone else can answer it. Your mprime data is much like my prime95 data, in that a curve on M1574 does not take much more time than a curve on M787. So we really do get a big boost when we can work on pairs of numbers simultaneously. I found the memory usage info interesting. gmp-ecm uses a memory intensive second stage, so the large memory usage there was no surprise, but I didn't expect mprime's memory usage to be quite so high. What were your memory settings? Sometimes with small exponents it is possible to reach a point of diminishing returns where additional memory does little to make the program run faster. Comparing your data to that of wblipp, it almost looks like the best solution is to run stage 1 on a P-4 and switch to an Athlon for stage 2 with gmp-ecm! Of course the ideal solution would be to combine the two programs...
 2004-04-05, 23:49 #3 biwema     Mar 2004 17D16 Posts If you are using the gmpecmstage1hook of prime 95, prime95 works on the mersenne/fermat number but writes just the composite (in hexadecimal) into the results.txt file from which you can resume with GMP-ECM. In case gmp-ecm is faster with the whole mersenne/fermat number instead of just the composite, it might notice that it is a divisor of the mersenne/fermat and use that one instead. Actually I couldn's manage to make gmp-ecm 5.1 beta run with that file. I does not accept a hexadecimal input number, and the variable with the intermediate result has the wrong name (QX instead of Q, I think). It is possible to correct that but unwieldy with large files.
 2004-04-06, 07:59 #4 akruppa     "Nancy" Aug 2002 Alexandria 9A316 Posts You can do the stage 1 on M1574 and then stage 2 in GMP-ECM on M787 and P787 individually, but it's a little awkward. Produce the results.txt file with stage 1 residues as usual, then copy it to two files (i.e. results.m787 and results.p787). Get the remaining cofactors of M787 and P787 in decimal or hexadecimal form (GMP-ECM can parse either) and in a text editor, substitute in results.m787 the "N=0x....;" by "N=;", similarly for results.p787. GMP-ECM will then reduce the stage 1 residue modulo the cofactor and all should work out nice. Better try on a number with a know factor and accompanying curve parameters first, though. Alex
2004-04-06, 10:14   #5
geoff

Mar 2003
New Zealand

13×89 Posts

Here are the times using mprime for stage 1 and gmp-ecm for stage 2, the command line was 'ecm -resume results.txt 1 44e6-4.29e9'.

Code:
Number       | Stage 1   | Stage 2   | Total per | Peak   | Time for
| B1=44e6   | B2=4.29e9 | curve     | memory | 19300 curves
-------------+-----------+-----------+-----------+--------+--------------
M787         |  541 sec. |  114 sec. |  655 sec. |  28 MB | 146 days.
P787         | 1354 sec. |   89 sec. | 1443 sec. |  23 MB | 322 days.
M1574        |  684 sec. |  244 sec. |  928 sec. |  46 MB | 207 days.
Clearly is it worth using gmp-ecm on a P4 to do stage 2 of these numbers. With these bounds the best strategy is to do curves on M1574, but that might change if a larger B2 is used, and if the stage 1 of M1574 and the stage 2 of P787 is used to do a curve for P787 as Alex suggests.

I will try to use this Maple program http://www.mersenneforum.org/showpo...06&postcount=19 to work out what the best B2 bound is and post the results for comparison.

Quote:
 Originally Posted by philmoore What were your memory settings?
I let mprime use up to 384 MB, which is probably the same as no constraint at all for these small exponents.

 2004-04-07, 14:46 #6 geoff     Mar 2003 New Zealand 13·89 Posts The following table shows the time to do N 50 digit curves with B1 fixed at 43e6. Stage 2 times are for gmp-ecm on a single curve. The P787 total is calculated using the stage 1 from M1574 since this is always faster than doing stage 1 for P787. The M+P787 column is calculated using stage 1 of M1574 plus stage 2 for both M787 and P787. Where f(B1,B2) is the probability of success at the 50 digit level defined by running the Maple program with parameters p=50, B1, B2, I calculated the number N of curves needed at bounds B1,B2 to be N = ceil(9000*f(43e6,180e9)/f(B1,B2)). The times for stage 1 (mprime) were: M787=530 sec. M1574=674 sec. Code:  Stage 2 time (sec.) Time for N curves (days) B2 N | M787 P787 M1574 | M787 P787 M1574 M+P787 ----------------+---------------------+---------------------------- 4.29e9 19073 | 114 89 244 | 142 168 203 194 6e9 17625 | 154 120 326 | 140 162 204 193 8e9 16512 | 201 158 419 | 140 159 209 197 10e9 15719 | 224 176 468 | 137 155 208 195 12e9 15112 | 237 187 491 | 134 146 204 192 14e9 14626 | 274 212 586 | 136 150 213 196 16e9 14223 | 286 221 613 | 134 147 212 194 18e9 13880 | 329 255 699 | 138 149 221 202 20e9 13584 | 351 273 745 | 139 149 223 204 30e9 12525 | 486 380 1008 | 147 153 244 223 180e9 9000 | 1899 1380 3798 | 253 214 466 412 From these times the best strategy is to do a common stage 1 on M1574 with mprime then do separate stage 2s on M787 and P787 with gmp-ecm, but the exact choice of B2 doesn't seem to make much difference. The default B2=100*B1 bound is reasonable, as is anything up to about B2=400*B1. Since the chance of finding a factor in M787 or P787 is equal while the P787 stage 2 computation is cheaper, I think this could be improved by using split B2 bounds to make the stage 2 times equal. I plan to do these calculations for a few other exponents for comparison, but first I will compile optimised gmp-ecm and libgmp binaries to see how much difference that makes over the generic Debian binaries.
 2004-04-08, 16:25 #7 geoff     Mar 2003 New Zealand 115710 Posts I recompiled libgmp with P4 specific optimisations and it has made a huge difference to the gmp-ecm times. This should make a larger B2 worthwhile now. Code:  Generic P4 Improvement ------------------------------------------------------------ Stage 1 B1=43e6 M787 2290s 1202s 91% P787 2112s 1041s 103% M1574 5855s 2842s 106% Stage 2 B2=180e9 M787 1899s 1088s 75% P787 1380s 817s 69% M1574 3798s 2015s 88%
 2004-04-08, 17:51 #8 akruppa     "Nancy" Aug 2002 Alexandria 2,467 Posts I think your expected number of curves do not account for the effect of the Brent-Suyama extension. I've run a program to count how many smooth number there are in the interval [2.5e48, 2.5e48+1e8], these are the results. Code: B2 deg B1-sm. B2-sm. Br.-Su. N 4.29e9 12 51 464 61 17361 1e10 12 51 579 64 14409 2e10 12 51 669 79 12500 18e10 12 51 1019 107 8491 18e10 30 51 1019 151 8188 Parameters were B1=4.29e9, B2 as given in the first column of the table, Brent-Suyama extension with Dickson polynomials of degree as in the second column, roots of F and G generated as GMP-ECM would with these parameters. For B2=18e10, I didn't actually compute the exponents for the roots of F and G to look for a match (would have taken too long), but merely added up expected values. However, the theoretical values and actual counts are usually in good agreement. The last column gives the expected number of curves to find a factor, i.e. 1e8/(B1-sm. + B2-sm. + Br.-Su.). Maybe these numbers will help you pick a good B2 value. Good luck! Alex
2004-04-09, 08:11   #9
smh

"Sander"
Oct 2002
52.345322,5.52471

29·41 Posts

Quote:
 I recompiled libgmp with P4 specific optimisations and it has made a huge difference to the gmp-ecm times.
Any chance you could make an optimised P4 windows version?

2004-04-10, 02:01   #10
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 I think your expected number of curves do not account for the effect of the Brent-Suyama extension.
Thanks for that information, I will see what difference it makes.

Quote:
 Any chance you could make an optimised P4 windows version?
I am not sure how to do that, but I use a P4 2.66GHz Windows machine at my local university to access the net, so I downloaded this binary http://www.geocities.com/greatpsycho/ecm5_p6_sse2.zip to try out. It is statically linked so a separate libgmp is not required. Adjusting for clock speed it runs in a very similar time to the one I compiled on Linux.

2004-04-10, 15:42   #11
smh

"Sander"
Oct 2002
52.345322,5.52471

4A516 Posts

Quote:
 Originally Posted by geoff I am not sure how to do that, but I use a P4 2.66GHz Windows machine at my local university to access the net, so I downloaded this binary http://www.geocities.com/greatpsycho/ecm5_p6_sse2.zip to try out. It is statically linked so a separate libgmp is not required. Adjusting for clock speed it runs in a very similar time to the one I compiled on Linux.
Thats the one i'm already using, but i'm always hoping for a version thats a few percentage faster

 Similar Threads Thread Thread Starter Forum Replies Last Post Unregistered Information & Answers 3 2011-08-05 09:17 Vincanis Information & Answers 4 2011-06-30 13:36 Lazlow PrimeNet 3 2008-10-29 20:40 aaronl Linux 1 2005-11-10 16:50 antiroach Software 2 2004-07-19 04:07

All times are UTC. The time now is 07:36.

Fri Feb 3 07:36:56 UTC 2023 up 169 days, 5:05, 1 user, load averages: 0.59, 0.76, 0.78

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔