mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   GMP 5.0.1 vs GMP 4.1.4 benchmarking (https://www.mersenneforum.org/showthread.php?t=15471)

unconnected 2011-03-31 12:21

GMP 5.0.1 vs GMP 4.1.4 benchmarking
 
I compared two GMP-ECM 6.3 builds under Linux. One compiled with GMP 5.0.1 and another with GMP 4.1.4
I got several strange results. In overall GMP 5.0.1 is better by 5-15% but with B1=11e6 with some ranges (tested 100-300digits) 4.1.4 was better. Some examples follows.

[CODE]1. C121 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=334640802
Step 1 took 36869ms
Step 2 took 19737ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2340904304
Step 1 took 35097ms
Step 2 took 33626ms

GMP 5.0.1 is significantly slower again on step 2.

2. C156 from aliquot seq 283752:i7004
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=4153245810
Step 1 took 55526ms
Step 2 took 26975ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2955949299
Step 1 took 57614ms
Step 2 took 39257ms

Again step 2 with GMP 5.0.1 is much slower.

3. C209 from near-repdigits
GMP-ECM 6.3 [configured with GMP 4.1.4 and --enable-asm-redc] [ECM]
Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2560444052
Step 1 took 75055ms
Step 2 took 36402ms

GMP-ECM 6.3 [configured with GMP 5.0.1 and --enable-asm-redc] [ECM]
Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=3908589128
Step 1 took 76103ms
Step 2 took 46634ms

Step 2 with GMP 5.0.1 is slower by 10sec.

With B1=3e6 all is OK - 5.0.1 is slightly better than 4.1.4
1. C121
Step 1 took 9562ms
Step 2 took 4803ms
vs.
Step 1 took 10009ms
Step 2 took 6219ms

2. C156
Step 1 took 15440ms
Step 2 took 6315ms
vs.
Step 1 took 15102ms
Step 2 took 8532ms

3. C209
Step 1 took 20846ms
Step 2 took 8188ms
vs.
Step 1 took 20306ms
Step 2 took 11598ms
[/CODE]I repeated tests 10x times and always got the same results. What's wrong?
Compile options: --enable-openmp --with-gmp=/usr/local/ --enable-shellcmd --enable-sse2 --enable-asm-redc
Test system: Xeon E5620 2.40GHz Centos 5.5 x86_64 on 2.6.18 kernel

Syd 2011-04-01 02:04

1 Attachment(s)
Thats exactly what I figured out some time ago. Especially on step 2 GMP 4.x is a lot faster - and I have no idea why.
The fastest combination for my Phenom 2 1090T is GMP 4.3.2 combined with GMP-ECM 6.3, all compiled with --march=barcelona and, of cause, linked statically.

For large numbers > ~ 400 digits linking against gwnum gave a huge speedup.

Table attached: All times in ms, mesaured on Phenom 2, 3.6Ghz, Linux kernel 2.6.35, 64 bit

unconnected 2011-04-01 12:40

[QUOTE=Syd;257229]For large numbers > ~ 400 digits linking against gwnum gave a huge speedup.
[/QUOTE]
I think it is only for 2^n-1 and 2^n+1 numbers.

I decided to recomplile binaries from scratch and there are some questions again.
Why ecm-params.h.athlon64 is used instead of ecm-params.h.core2 ?
Why SSE2 instructions were not used in NTT code?

[CODE]
config.status: linking ecm-params.h.athlon64 to ecm-params.h
config.status: linking mul_fft-params.h.athlon64 to mul_fft-params.h
config.status: executing depfiles commands
config.status: executing libtool commands
configure: Configuration:
configure: Build for host type x86_64-unknown-linux-gnu
configure: CC=gcc -std=gnu99, CFLAGS=-W -Wall -Wundef -O2 -pedantic -m64 -mtune=core2 -march=core2
configure: Linking GMP with /usr/local//lib/libgmp.a
configure: Using asm redc code from directory x86_64
configure: Not using SSE2 instructions in NTT code
[/CODE]

jasonp 2011-04-01 16:12

[QUOTE=unconnected;257275]
Why SSE2 instructions were not used in NTT code?
[/QUOTE]
The developers have had a little trouble detecting SSE2 across a wide enough range of platforms. Is this with the latest SVN? IIRC it has fixes for a problem somewhat like yours.

ATH 2011-04-02 16:48

How are you compiling GMP 4.3.2 for 64bit?

I get this error:
[QUOTE]configure: error: Oops, mp_limb_t is 32 bits, but the assembler code
in this configuration expects 64 bits.
You appear to have set $CFLAGS, perhaps you also need to tell GMP the
intended ABI, see "ABI and ISA" in the manual.[/QUOTE]

I compile in Mingw64 with:
./configure CC=gcc CFLAGS="-O2 -pedantic -m64 -std=gnu99 -mtune=core2 -march=core2" ABI=64 --build=x86_64-w64-mingw32
I also tried just:
./configure ABI=64
and variations.

I read on GMP website: "Gcc 4.3.2 miscompiles GMP on 64-bit machines", but I'm using gcc 4.6.0.

ATH 2011-04-03 16:16

Here is my 32bit test of GMP 4.3.2 vs 5.0.1 and MPIR: [URL="http://www.hoegge.dk/mersenne/gmp4test.html"]gmp4test.html[/URL]

I can't see the effect you describe. On a core2 the GMP 4.3.2 binary is alot slower than both GMP 5.0.1 and MPIR 2.3.0/2.2.1. On a pentium4 its only slightly slower than GMP 5.0.1 and faster than MPIR.

If you have a link to GMP 4.1.4 I'm willing to test it.


All times are UTC. The time now is 17:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.