![]() |
[QUOTE=jasonp]As mentioned before, experimental code is now in place
to attempt to get Intel CPUs to sieve faster. If you use such a CPU and have logs of old factorizations lying around, I'm especially interested in whether the performance changes when those factorizations are repeated with 0.88. I honestly don't know if it will help, and it may hurt.[/QUOTE] Factoring: 314159265358979323846264338328907078125461821459237067283727 Using 0.87 with an Intel P4 1.8 GHz = 44.287 seconds Using 0.88beta with an Intel P4 1.8 GHz = 63.501 seconds That is using your precompiled Windows binaries in Win2k. Let me try some other P4 machines as well. |
The c60 which took 133s or 128s with v0.87 now takes 120s or 133s with v0.88 - no great change. This is on a 500 MHz Pentium 3 Katmai, compiled with gcc 3.4.2 with the stock Makefile except for -march=pentium3.
Alex |
Here are some more benchmarks when factoring:
[b]314159265358979323846264338328907078125461821459237067283727[/b] Looks like with the P4, that the new version is slower than 0.87. [b]Pentium 4 (3.0 GHz, 1MB L2 Cache)[/b] Linux 2.6.9 with gcc 3.4.2 v0.87 = 21.589 sec (-march=athlon) v0.87 = 24.054 sec (-march=pentium4) v0.88 = 26.297 sec (-march=athlon) v0.88 = 26.451 sec (-march=pentium4) [b]Pentium 4 (1.8 GHz, 256KB L2 Cache)[/b] Windows2000 cygwin with 3.3.1 v0.87 = 37.702 sec (-march=pentium4) v0.87 = 39.823 sec (-march=athlon) v0.88 = 47.397 sec (-march=pentium4) v0.88 = 54.129 sec (-march=athlon) [b]Pentium 3 (900 MHz, 256KB L2 Cache)[/b] Linux 2.4.20 with gcc 3.2.2 v0.88 = 53.581 sec (-march=athlon) v0.87 = 56.148 sec (-march=athlon) v0.87 = 57.432 sec (-march=pentium3) v0.88 = 57.438 sec (-march=pentium3) |
More times for 0.88 beta
Using the same c60 I found the following on this sysytem.
Abit NF7-S motherboard (AthlonXP 3000+) AMD-2167, 512K Cache, CPU BUS 166 MHz, 768M PC2100, Bus 333 MHz worktodo.ini 314159265358979323846264338328907078125461821459237067283727 Windows 2000 0.87 = 12s 0.88 = 12s Slackware-10, GCC-4.0-CURRENT 0.87 = 12s 0.88 = 12s |
1 Attachment(s)
[QUOTE=Jeff Gilchrist]
Looks like with the P4, that the new version is slower than 0.87. [/QUOTE] I've added a few optimizations to the polynomial switching code, and reverted some of the Intel CPU stuff. Updated beta is at [url]www.boo.net/~jasonp/msieve088b2.tar.gz[/url] [url]www.boo.net/~jasonp/msieve_beta2.exe[/url] If you want to compare runtimes, I've attached the numbers I use for performance and QA testing. Runtimes for a 2GHz opteron: c60: 8 s c70: 2m 22s c80: 18m 47s c95: 7h 5m c100: 17h 30m Visual studio .net seems to do a pretty good job with this program, even though things like prefetches are turned off. The c70 above takes 3m 5s on a 2.8GHz Northwood P4 when compiled with .net and all the optimization switches I could find. The c60 runs in (I think) 17 seconds, which appears better than what gcc is managing. Back when this was an ordinary QS utility (i.e. when it didn't use multiple polynomials), a c70 took 3.5 hours on a 1GHz K7. Now a c70 is done in under 5 minutes on the same machine! Let me know how this version works out. jasonp |
Running beta 2 on a Pentium M 1.4 Ghz with 512 MB RAM and WinXP.
C60 -> 00:00:18 (but isn't it a C59?) C70 -> 00:03:27 [code] Wed Dec 22 11:42:16 2004 Msieve v. 0.88 Wed Dec 22 11:42:16 2004 random seeds: 5a6fbcb0 2b9696ba Wed Dec 22 11:42:16 2004 factoring 23099884946009620096243803727122140213996633382129002610539 (59 digits) Wed Dec 22 11:42:17 2004 using multiplier of 3 Wed Dec 22 11:42:17 2004 using sieve block of 32768 Wed Dec 22 11:42:17 2004 using a sieve bound of 51283 (2647 primes) Wed Dec 22 11:42:17 2004 using large prime bound of 2359018 Wed Dec 22 11:42:33 2004 found 3163 relations (1449 full + 1714 partial), need 2743 Wed Dec 22 11:42:33 2004 begin with 12870 relations Wed Dec 22 11:42:33 2004 reduce to 3194 relations in 2 passes Wed Dec 22 11:42:33 2004 attempting to read 1449 full and 3194 partial relations Wed Dec 22 11:42:33 2004 recovered 1449 full and 3194 partial relations Wed Dec 22 11:42:33 2004 recovered 3785 polynomials Wed Dec 22 11:42:33 2004 attempting to build 1714 cycles Wed Dec 22 11:42:33 2004 found 1714 cycles in 1 passes Wed Dec 22 11:42:33 2004 distribution of cycle lengths: Wed Dec 22 11:42:33 2004 length 2 : 1714 Wed Dec 22 11:42:33 2004 largest cycle: 2 relations Wed Dec 22 11:42:33 2004 2647 x 2711 system, weight 66061 (avg 24.37/col) Wed Dec 22 11:42:33 2004 reduce to 2510 x 2574 in 3 passes Wed Dec 22 11:42:33 2004 lanczos halted after 41 iterations Wed Dec 22 11:42:33 2004 recovered 64 nontrivial dependencies Wed Dec 22 11:42:34 2004 prp30 factor: 115342651612924837149281980471 Wed Dec 22 11:42:34 2004 prp30 factor: 200271838933700560014162995309 Wed Dec 22 11:42:34 2004 elapsed time 00:00:18 Wed Dec 22 11:42:44 2004 Wed Dec 22 11:42:44 2004 Wed Dec 22 11:42:44 2004 Msieve v. 0.88 Wed Dec 22 11:42:44 2004 random seeds: 3f6afeb5 18ebd953 Wed Dec 22 11:42:44 2004 factoring 9813956594010984314135286435578351358968896438035014755553156535972277 (70 digits) Wed Dec 22 11:42:45 2004 using multiplier of 1 Wed Dec 22 11:42:45 2004 using sieve block of 32768 Wed Dec 22 11:42:45 2004 using a sieve bound of 249593 (11000 primes) Wed Dec 22 11:42:45 2004 using large prime bound of 19967440 Wed Dec 22 11:46:06 2004 found 11402 relations (5626 full + 5776 partial), need 11096 Wed Dec 22 11:46:06 2004 begin with 58272 relations Wed Dec 22 11:46:06 2004 reduce to 10792 relations in 2 passes Wed Dec 22 11:46:06 2004 attempting to read 5626 full and 10792 partial relations Wed Dec 22 11:46:07 2004 recovered 5626 full and 10792 partial relations Wed Dec 22 11:46:07 2004 recovered 14308 polynomials Wed Dec 22 11:46:07 2004 attempting to build 5776 cycles Wed Dec 22 11:46:07 2004 found 5776 cycles in 1 passes Wed Dec 22 11:46:07 2004 distribution of cycle lengths: Wed Dec 22 11:46:07 2004 length 2 : 5776 Wed Dec 22 11:46:07 2004 largest cycle: 2 relations Wed Dec 22 11:46:07 2004 11000 x 11064 system, weight 306354 (avg 27.69/col) Wed Dec 22 11:46:07 2004 reduce to 9954 x 10018 in 3 passes Wed Dec 22 11:46:08 2004 lanczos halted after 159 iterations Wed Dec 22 11:46:08 2004 recovered 64 nontrivial dependencies Wed Dec 22 11:46:12 2004 prp35 factor: 66942360975183897818097898965687583 Wed Dec 22 11:46:12 2004 prp36 factor: 146603084370584142319953689355125419 Wed Dec 22 11:46:12 2004 elapsed time 00:03:27 [/code] |
0.88b2
Running
[url]www.boo.net/~jasonp/msieve088b2.tar.gz[/url] [url]www.boo.net/~jasonp/msieve_beta2.exe[/url] c59 = 23099884946009620096243803727122140213996633382129002610539 c70 = 9813956594010984314135286435578351358968896438035014755553156535972277 c79 = 6925808622746428593966930370141140291693107698007831907389943424636842394693639 G4-867 (Apple laptop) Darwin 7.7.0 c59 = 33s c70 = 6m25s c79 = 15m51s AthlonXP-3000+(2172) Windows 2000 c59 = 10s c70 = 2m07s C80 = 5m43s AMD64-3400+(2310) Fedora Core 3 c59 = 9s c70 = 1m44s c79 = 4m26 |
This seems to be a definite improvement over 0.88b1:
Benchmarks when factoring: [b]314159265358979323846264338328907078125461821459237067283727[/b] [b]Pentium 4 (3.0 GHz, 1MB L2 Cache)[/b] Linux 2.6.9 with gcc 3.4.2 v0.88b2 = 21.412 sec (-march=pentium4) v0.87 = 21.589 sec (-march=athlon) v0.88b2 = 24.025 sec (-march=athlon) v0.87 = 24.054 sec (-march=pentium4) v0.88 = 26.297 sec (-march=athlon) v0.88 = 26.451 sec (-march=pentium4) [b]Pentium 4 (1.8 GHz, 256KB L2 Cache)[/b] Windows2000 cygwin with 3.3.1 v0.88b2 = 33.421 sec (-march=pentium4) v0.88b2 = 34.213 sec (-march=athlon) v0.87 = 37.702 sec (-march=pentium4) v0.87 = 39.823 sec (-march=athlon) v0.88 = 47.397 sec (-march=pentium4) v0.88 = 54.129 sec (-march=athlon) [b]Pentium 3 (900 MHz, 256KB L2 Cache)[/b] Linux 2.4.20 with gcc 3.2.2 v0.88b2 = 42.366 sec (-march=athlon) v0.88b2 = 42.912 sec (-march=pentium4) v0.88 = 53.581 sec (-march=athlon) v0.87 = 56.148 sec (-march=athlon) v0.87 = 57.432 sec (-march=pentium3) v0.88 = 57.438 sec (-march=pentium3) I am now running benchmarks with the tests you posted above so will report back when they are done. |
[QUOTE=error404]
c79 = 6925808622746428593966930370141140291693107698007831907389943424636842394693639 c79 = 15m51s C80 = 5m43s c79 = 4m26[/QUOTE] Note that this number has 5 digits worth of small factors. jasonp |
Ok here are my latest benchmark results.
[CODE]Factoring: C59 = 23099884946009620096243803727122140213996633382129002610539 C70 = 9813956594010984314135286435578351358968896438035014755553156535972277 C80 = 16925808622746428593966930370141140291693107698007831907389943424636842394693639 Pentium 4 (3.0 GHz, 1MB L2 Cache) Linux 2.6.9 with gcc 3.4.2 (-march=pentium4) C59 v0.88b2 = 00:00:17 C59 v0.87 = 00:00:18 C59 v0.88 = 00:00:21 C70 v0.88b2 = 00:02:55 C70 v0.87 = 00:02:57 C70 v0.88 = 00:03:26 C80 v0.88b2 = 00:19:41 C80 v0.87 = 00:20:25 C80 v0.88 = 00:21:54 Pentium 4 (1.8 GHz, 256KB L2 Cache) Windows2000 cygwin with 3.3.1 (-march=pentium4) C59 v0.88b2 = 00:00:28 C59 v0.87 = 00:00:29 C70 v0.88b2 = 00:05:17 C70 v0.87 = 00:05:29 C80 v0.88b2 = 00:35:37 C80 v0.87 = 00:37:43 Pentium 3 (900 MHz, 256KB L2 Cache) Linux 2.4.20 with gcc 3.2.2 (-march=pentium3) C59 v0.88b2 = 00:00:34 C59 v0.87 = 00:00:43 C70 v0.88b2 = 00:08:24 C70 v0.87 = 00:10:03 C80 v0.88b2 = 01:02:17 C80 v0.87 = 01:06:15[/CODE] So much improved over the 0.88b1 code and also a little faster than the 0.87 code. Good work! I really like the new features you have put in for 0.88, makes it so much easier to test. |
Msieve 0.88
Version 0.88 is now available. The only difference between
0.88 beta 2 and 0.88 is that the current version prints the date to the screen as the factorization is starting. Everyone's pentium times look a lot better than I'd feared, so that the performance between AMD and Intel CPUs is pretty close (the AMDs are 10-15% faster on average). If anyone wants to integrate msieve into some other application, I'll be happy to help. Happy holidays! jasonp |
| All times are UTC. The time now is 20:23. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.