mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Feedback for new MPQS utility sought (https://www.mersenneforum.org/showthread.php?t=3240)

Jeff Gilchrist 2004-12-21 17:13

[QUOTE=jasonp]As mentioned before, experimental code is now in place
to attempt to get Intel CPUs to sieve faster. If you
use such a CPU and have logs of old factorizations lying
around, I'm especially interested in whether the performance
changes when those factorizations are repeated with 0.88.
I honestly don't know if it will help, and it may hurt.[/QUOTE]

Factoring: 314159265358979323846264338328907078125461821459237067283727

Using 0.87 with an Intel P4 1.8 GHz = 44.287 seconds

Using 0.88beta with an Intel P4 1.8 GHz = 63.501 seconds


That is using your precompiled Windows binaries in Win2k. Let me try some other P4 machines as well.

akruppa 2004-12-21 17:19

The c60 which took 133s or 128s with v0.87 now takes 120s or 133s with v0.88 - no great change. This is on a 500 MHz Pentium 3 Katmai, compiled with gcc 3.4.2 with the stock Makefile except for -march=pentium3.

Alex

Jeff Gilchrist 2004-12-21 18:08

Here are some more benchmarks when factoring:
[b]314159265358979323846264338328907078125461821459237067283727[/b]

Looks like with the P4, that the new version is slower than 0.87.


[b]Pentium 4 (3.0 GHz, 1MB L2 Cache)[/b]
Linux 2.6.9 with gcc 3.4.2

v0.87 = 21.589 sec (-march=athlon)
v0.87 = 24.054 sec (-march=pentium4)
v0.88 = 26.297 sec (-march=athlon)
v0.88 = 26.451 sec (-march=pentium4)


[b]Pentium 4 (1.8 GHz, 256KB L2 Cache)[/b]
Windows2000 cygwin with 3.3.1

v0.87 = 37.702 sec (-march=pentium4)
v0.87 = 39.823 sec (-march=athlon)
v0.88 = 47.397 sec (-march=pentium4)
v0.88 = 54.129 sec (-march=athlon)


[b]Pentium 3 (900 MHz, 256KB L2 Cache)[/b]
Linux 2.4.20 with gcc 3.2.2

v0.88 = 53.581 sec (-march=athlon)
v0.87 = 56.148 sec (-march=athlon)
v0.87 = 57.432 sec (-march=pentium3)
v0.88 = 57.438 sec (-march=pentium3)

error404 2004-12-21 22:35

More times for 0.88 beta
 
Using the same c60 I found the following on this sysytem.

Abit NF7-S motherboard (AthlonXP 3000+)
AMD-2167, 512K Cache, CPU BUS 166 MHz, 768M PC2100, Bus 333 MHz

worktodo.ini
314159265358979323846264338328907078125461821459237067283727

Windows 2000
0.87 = 12s
0.88 = 12s

Slackware-10, GCC-4.0-CURRENT
0.87 = 12s
0.88 = 12s

jasonp 2004-12-22 05:25

1 Attachment(s)
[QUOTE=Jeff Gilchrist]
Looks like with the P4, that the new version is slower than 0.87.
[/QUOTE]
I've added a few optimizations to the polynomial switching code, and reverted some of the Intel CPU stuff. Updated beta is at

[url]www.boo.net/~jasonp/msieve088b2.tar.gz[/url]
[url]www.boo.net/~jasonp/msieve_beta2.exe[/url]

If you want to compare runtimes, I've attached the numbers I use for performance and QA testing. Runtimes for a 2GHz opteron:

c60: 8 s
c70: 2m 22s
c80: 18m 47s
c95: 7h 5m
c100: 17h 30m

Visual studio .net seems to do a pretty good job with this program, even though things like prefetches are turned off. The c70 above takes 3m 5s on a 2.8GHz Northwood P4 when compiled with .net and all the optimization switches I could find. The c60 runs in (I think) 17 seconds, which appears better than what gcc is managing.

Back when this was an ordinary QS utility (i.e. when it didn't use multiple polynomials), a c70 took 3.5 hours on a 1GHz K7. Now a c70 is done in under 5 minutes on the same machine!

Let me know how this version works out.
jasonp

BotXXX 2004-12-22 11:11

Running beta 2 on a Pentium M 1.4 Ghz with 512 MB RAM and WinXP.

C60 -> 00:00:18 (but isn't it a C59?)
C70 -> 00:03:27

[code]
Wed Dec 22 11:42:16 2004 Msieve v. 0.88
Wed Dec 22 11:42:16 2004 random seeds: 5a6fbcb0 2b9696ba
Wed Dec 22 11:42:16 2004 factoring 23099884946009620096243803727122140213996633382129002610539 (59 digits)
Wed Dec 22 11:42:17 2004 using multiplier of 3
Wed Dec 22 11:42:17 2004 using sieve block of 32768
Wed Dec 22 11:42:17 2004 using a sieve bound of 51283 (2647 primes)
Wed Dec 22 11:42:17 2004 using large prime bound of 2359018
Wed Dec 22 11:42:33 2004 found 3163 relations (1449 full + 1714 partial), need 2743
Wed Dec 22 11:42:33 2004 begin with 12870 relations
Wed Dec 22 11:42:33 2004 reduce to 3194 relations in 2 passes
Wed Dec 22 11:42:33 2004 attempting to read 1449 full and 3194 partial relations
Wed Dec 22 11:42:33 2004 recovered 1449 full and 3194 partial relations
Wed Dec 22 11:42:33 2004 recovered 3785 polynomials
Wed Dec 22 11:42:33 2004 attempting to build 1714 cycles
Wed Dec 22 11:42:33 2004 found 1714 cycles in 1 passes
Wed Dec 22 11:42:33 2004 distribution of cycle lengths:
Wed Dec 22 11:42:33 2004 length 2 : 1714
Wed Dec 22 11:42:33 2004 largest cycle: 2 relations
Wed Dec 22 11:42:33 2004 2647 x 2711 system, weight 66061 (avg 24.37/col)
Wed Dec 22 11:42:33 2004 reduce to 2510 x 2574 in 3 passes
Wed Dec 22 11:42:33 2004 lanczos halted after 41 iterations
Wed Dec 22 11:42:33 2004 recovered 64 nontrivial dependencies
Wed Dec 22 11:42:34 2004 prp30 factor: 115342651612924837149281980471
Wed Dec 22 11:42:34 2004 prp30 factor: 200271838933700560014162995309
Wed Dec 22 11:42:34 2004 elapsed time 00:00:18
Wed Dec 22 11:42:44 2004
Wed Dec 22 11:42:44 2004
Wed Dec 22 11:42:44 2004 Msieve v. 0.88
Wed Dec 22 11:42:44 2004 random seeds: 3f6afeb5 18ebd953
Wed Dec 22 11:42:44 2004 factoring 9813956594010984314135286435578351358968896438035014755553156535972277 (70 digits)
Wed Dec 22 11:42:45 2004 using multiplier of 1
Wed Dec 22 11:42:45 2004 using sieve block of 32768
Wed Dec 22 11:42:45 2004 using a sieve bound of 249593 (11000 primes)
Wed Dec 22 11:42:45 2004 using large prime bound of 19967440
Wed Dec 22 11:46:06 2004 found 11402 relations (5626 full + 5776 partial), need 11096
Wed Dec 22 11:46:06 2004 begin with 58272 relations
Wed Dec 22 11:46:06 2004 reduce to 10792 relations in 2 passes
Wed Dec 22 11:46:06 2004 attempting to read 5626 full and 10792 partial relations
Wed Dec 22 11:46:07 2004 recovered 5626 full and 10792 partial relations
Wed Dec 22 11:46:07 2004 recovered 14308 polynomials
Wed Dec 22 11:46:07 2004 attempting to build 5776 cycles
Wed Dec 22 11:46:07 2004 found 5776 cycles in 1 passes
Wed Dec 22 11:46:07 2004 distribution of cycle lengths:
Wed Dec 22 11:46:07 2004 length 2 : 5776
Wed Dec 22 11:46:07 2004 largest cycle: 2 relations
Wed Dec 22 11:46:07 2004 11000 x 11064 system, weight 306354 (avg 27.69/col)
Wed Dec 22 11:46:07 2004 reduce to 9954 x 10018 in 3 passes
Wed Dec 22 11:46:08 2004 lanczos halted after 159 iterations
Wed Dec 22 11:46:08 2004 recovered 64 nontrivial dependencies
Wed Dec 22 11:46:12 2004 prp35 factor: 66942360975183897818097898965687583
Wed Dec 22 11:46:12 2004 prp36 factor: 146603084370584142319953689355125419
Wed Dec 22 11:46:12 2004 elapsed time 00:03:27
[/code]

error404 2004-12-22 14:30

0.88b2
 
Running

[url]www.boo.net/~jasonp/msieve088b2.tar.gz[/url]
[url]www.boo.net/~jasonp/msieve_beta2.exe[/url]


c59 = 23099884946009620096243803727122140213996633382129002610539
c70 = 9813956594010984314135286435578351358968896438035014755553156535972277
c79 = 6925808622746428593966930370141140291693107698007831907389943424636842394693639

G4-867 (Apple laptop) Darwin 7.7.0
c59 = 33s
c70 = 6m25s
c79 = 15m51s

AthlonXP-3000+(2172) Windows 2000
c59 = 10s
c70 = 2m07s
C80 = 5m43s

AMD64-3400+(2310) Fedora Core 3
c59 = 9s
c70 = 1m44s
c79 = 4m26

Jeff Gilchrist 2004-12-22 16:48

This seems to be a definite improvement over 0.88b1:

Benchmarks when factoring:
[b]314159265358979323846264338328907078125461821459237067283727[/b]


[b]Pentium 4 (3.0 GHz, 1MB L2 Cache)[/b]
Linux 2.6.9 with gcc 3.4.2

v0.88b2 = 21.412 sec (-march=pentium4)
v0.87 = 21.589 sec (-march=athlon)
v0.88b2 = 24.025 sec (-march=athlon)
v0.87 = 24.054 sec (-march=pentium4)
v0.88 = 26.297 sec (-march=athlon)
v0.88 = 26.451 sec (-march=pentium4)


[b]Pentium 4 (1.8 GHz, 256KB L2 Cache)[/b]
Windows2000 cygwin with 3.3.1

v0.88b2 = 33.421 sec (-march=pentium4)
v0.88b2 = 34.213 sec (-march=athlon)
v0.87 = 37.702 sec (-march=pentium4)
v0.87 = 39.823 sec (-march=athlon)
v0.88 = 47.397 sec (-march=pentium4)
v0.88 = 54.129 sec (-march=athlon)


[b]Pentium 3 (900 MHz, 256KB L2 Cache)[/b]
Linux 2.4.20 with gcc 3.2.2

v0.88b2 = 42.366 sec (-march=athlon)
v0.88b2 = 42.912 sec (-march=pentium4)
v0.88 = 53.581 sec (-march=athlon)
v0.87 = 56.148 sec (-march=athlon)
v0.87 = 57.432 sec (-march=pentium3)
v0.88 = 57.438 sec (-march=pentium3)


I am now running benchmarks with the tests you posted above so will report back when they are done.

jasonp 2004-12-22 17:06

[QUOTE=error404]
c79 = 6925808622746428593966930370141140291693107698007831907389943424636842394693639

c79 = 15m51s

C80 = 5m43s

c79 = 4m26[/QUOTE]
Note that this number has 5 digits worth of small factors.

jasonp

Jeff Gilchrist 2004-12-22 19:43

Ok here are my latest benchmark results.

[CODE]Factoring:
C59 = 23099884946009620096243803727122140213996633382129002610539
C70 = 9813956594010984314135286435578351358968896438035014755553156535972277
C80 = 16925808622746428593966930370141140291693107698007831907389943424636842394693639


Pentium 4 (3.0 GHz, 1MB L2 Cache)
Linux 2.6.9 with gcc 3.4.2 (-march=pentium4)

C59 v0.88b2 = 00:00:17
C59 v0.87 = 00:00:18
C59 v0.88 = 00:00:21

C70 v0.88b2 = 00:02:55
C70 v0.87 = 00:02:57
C70 v0.88 = 00:03:26

C80 v0.88b2 = 00:19:41
C80 v0.87 = 00:20:25
C80 v0.88 = 00:21:54



Pentium 4 (1.8 GHz, 256KB L2 Cache)
Windows2000 cygwin with 3.3.1 (-march=pentium4)

C59 v0.88b2 = 00:00:28
C59 v0.87 = 00:00:29

C70 v0.88b2 = 00:05:17
C70 v0.87 = 00:05:29

C80 v0.88b2 = 00:35:37
C80 v0.87 = 00:37:43



Pentium 3 (900 MHz, 256KB L2 Cache)
Linux 2.4.20 with gcc 3.2.2 (-march=pentium3)

C59 v0.88b2 = 00:00:34
C59 v0.87 = 00:00:43

C70 v0.88b2 = 00:08:24
C70 v0.87 = 00:10:03

C80 v0.88b2 = 01:02:17
C80 v0.87 = 01:06:15[/CODE]

So much improved over the 0.88b1 code and also a little faster than the 0.87 code. Good work! I really like the new features you have put in for 0.88, makes it so much easier to test.

jasonp 2004-12-24 13:16

Msieve 0.88
 
Version 0.88 is now available. The only difference between
0.88 beta 2 and 0.88 is that the current version prints the
date to the screen as the factorization is starting.

Everyone's pentium times look a lot better than I'd feared,
so that the performance between AMD and Intel CPUs is
pretty close (the AMDs are 10-15% faster on average).

If anyone wants to integrate msieve into some other application,
I'll be happy to help.

Happy holidays!
jasonp


All times are UTC. The time now is 20:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.