mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2011-03-31, 12:21   #1
unconnected
 
unconnected's Avatar
 
May 2009
Russia, Moscow

2×5×251 Posts
Default GMP 5.0.1 vs GMP 4.1.4 benchmarking

I compared two GMP-ECM 6.3 builds under Linux. One compiled with GMP 5.0.1 and another with GMP 4.1.4
I got several strange results. In overall GMP 5.0.1 is better by 5-15% but with B1=11e6 with some ranges (tested 100-300digits) 4.1.4 was better. Some examples follows.

Code:
1. C121 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=334640802
Step 1 took 36869ms
Step 2 took 19737ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2340904304
Step 1 took 35097ms
Step 2 took 33626ms

GMP 5.0.1 is significantly slower again on step 2.

2. C156 from aliquot seq 283752:i7004
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=4153245810
Step 1 took 55526ms
Step 2 took 26975ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2955949299
Step 1 took 57614ms
Step 2 took 39257ms

Again step 2 with GMP 5.0.1 is much slower.

3. C209 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is  99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999  (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2560444052
Step 1 took 75055ms
Step 2 took 36402ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is  99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999  (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=3908589128
Step 1 took 76103ms
Step 2 took 46634ms

Step 2 with GMP 5.0.1 is slower by 10sec.

With B1=3e6 all is OK - 5.0.1 is slightly better than 4.1.4
1. C121
Step 1 took 9562ms
Step 2 took 4803ms
vs.
Step 1 took 10009ms
Step 2 took 6219ms

2. C156
Step 1 took 15440ms
Step 2 took 6315ms
vs.
Step 1 took 15102ms
Step 2 took 8532ms

3. C209
Step 1 took 20846ms
Step 2 took 8188ms
vs.
Step 1 took 20306ms
Step 2 took 11598ms
I repeated tests 10x times and always got the same results. What's wrong?
Compile options: --enable-openmp --with-gmp=/usr/local/ --enable-shellcmd --enable-sse2 --enable-asm-redc
Test system: Xeon E5620 2.40GHz Centos 5.5 x86_64 on 2.6.18 kernel

Last fiddled with by unconnected on 2011-03-31 at 12:23
unconnected is offline   Reply With Quote
Old 2011-04-01, 02:04   #2
Syd
 
Syd's Avatar
 
Sep 2008
Krefeld, Germany

3468 Posts
Default

Thats exactly what I figured out some time ago. Especially on step 2 GMP 4.x is a lot faster - and I have no idea why.
The fastest combination for my Phenom 2 1090T is GMP 4.3.2 combined with GMP-ECM 6.3, all compiled with --march=barcelona and, of cause, linked statically.

For large numbers > ~ 400 digits linking against gwnum gave a huge speedup.

Table attached: All times in ms, mesaured on Phenom 2, 3.6Ghz, Linux kernel 2.6.35, 64 bit
Attached Thumbnails
Click image for larger version

Name:	ecm.png
Views:	176
Size:	69.3 KB
ID:	6417  

Last fiddled with by Syd on 2011-04-01 at 02:10
Syd is offline   Reply With Quote
Old 2011-04-01, 12:40   #3
unconnected
 
unconnected's Avatar
 
May 2009
Russia, Moscow

2·5·251 Posts
Default

Quote:
Originally Posted by Syd View Post
For large numbers > ~ 400 digits linking against gwnum gave a huge speedup.
I think it is only for 2^n-1 and 2^n+1 numbers.

I decided to recomplile binaries from scratch and there are some questions again.
Why ecm-params.h.athlon64 is used instead of ecm-params.h.core2 ?
Why SSE2 instructions were not used in NTT code?

Code:
config.status: linking ecm-params.h.athlon64 to ecm-params.h
config.status: linking mul_fft-params.h.athlon64 to mul_fft-params.h
config.status: executing depfiles commands
config.status: executing libtool commands
configure: Configuration:
configure: Build for host type x86_64-unknown-linux-gnu
configure: CC=gcc -std=gnu99, CFLAGS=-W -Wall -Wundef -O2 -pedantic -m64 -mtune=core2 -march=core2
configure: Linking GMP with /usr/local//lib/libgmp.a
configure: Using asm redc code from directory x86_64
configure: Not using SSE2 instructions in NTT code
unconnected is offline   Reply With Quote
Old 2011-04-01, 16:12   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

67148 Posts
Default

Quote:
Originally Posted by unconnected View Post
Why SSE2 instructions were not used in NTT code?
The developers have had a little trouble detecting SSE2 across a wide enough range of platforms. Is this with the latest SVN? IIRC it has fixes for a problem somewhat like yours.
jasonp is offline   Reply With Quote
Old 2011-04-02, 16:48   #5
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

3·5·199 Posts
Default

How are you compiling GMP 4.3.2 for 64bit?

I get this error:
Quote:
configure: error: Oops, mp_limb_t is 32 bits, but the assembler code
in this configuration expects 64 bits.
You appear to have set $CFLAGS, perhaps you also need to tell GMP the
intended ABI, see "ABI and ISA" in the manual.
I compile in Mingw64 with:
./configure CC=gcc CFLAGS="-O2 -pedantic -m64 -std=gnu99 -mtune=core2 -march=core2" ABI=64 --build=x86_64-w64-mingw32
I also tried just:
./configure ABI=64
and variations.

I read on GMP website: "Gcc 4.3.2 miscompiles GMP on 64-bit machines", but I'm using gcc 4.6.0.

Last fiddled with by ATH on 2011-04-02 at 16:49
ATH is offline   Reply With Quote
Old 2011-04-03, 16:16   #6
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1011101010012 Posts
Default

Here is my 32bit test of GMP 4.3.2 vs 5.0.1 and MPIR: gmp4test.html

I can't see the effect you describe. On a core2 the GMP 4.3.2 binary is alot slower than both GMP 5.0.1 and MPIR 2.3.0/2.2.1. On a pentium4 its only slightly slower than GMP 5.0.1 and faster than MPIR.

If you have a link to GMP 4.1.4 I'm willing to test it.
ATH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for benchmarking help with a Phenom or PhenomII X6 mrolle Software 25 2012-03-14 14:15
Benchmarking dual-CPU machines garo Software 2 2010-09-27 20:33
Benchmarking suite discussion Mystwalker GMP-ECM 7 2006-06-11 10:08
Benchmarking problem with Prime95 jasong Factoring 6 2006-03-23 05:12
Benchmarking challenge! Xyzzy Software 17 2003-08-26 15:43

All times are UTC. The time now is 18:00.

Fri Dec 4 18:00:16 UTC 2020 up 1 day, 14:11, 0 users, load averages: 1.77, 1.78, 1.76

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.