mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2009-04-27, 15:40   #1
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

22258 Posts
Default gmp-ecm 6.2.3 discussion & benchmarks

Instead of putting everything into the binaries thread why don't we start a new thread related to 6.2.3, problems, benchmarks, and other details.

The configure system changed since 6.2.2 where it used to detect my core2/penryn system as a pentium3-pc-cygwin by default it now detects it as i686-pc-cygwin.

If I manually put "--build=pentium4" then it shows up as i786-pc-cygwin. I think Alex was saying he is still working on SSE2 detection support but it would be good if it could at least detect the system as a pentium4 or better. As you will see below that code path is faster.

GMP 4.3.0 detects the system as: core2-pc-cygwin
MPIR 1.1.1 detects the system as: penryn-pc-cygwin

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-27 at 15:40
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-27, 15:52   #2
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default GMP-ECM Win32 Benchmarks

On a Core2 Q9550 system at 3.4GHz using the following:
./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2

and

./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4

There are the results using cygwin to create Win32 binaries.

Code:
MSVC 32bit:

GMP-ECM 6.2.3 [powered by GMP 4.2.1_MPIR_1.1.1] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 84911ms
Step 2 took 4477ms


cygwin 32bit (pentium4) with MPIR 1.1.1:

GMP-ECM 6.2.3 [powered by GMP 4.2.1] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 79015ms
Step 2 took 2745ms


cygwin 32bit (pentium4) with GMP 4.3.0:

GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 81245ms
Step 2 took 2792ms


cygwin 32bit (default config i686) with MPIR 1.1.1:

GMP-ECM 6.2.3 [powered by GMP 4.2.1] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 94038ms
Step 2 took 2839ms


cygwin 32bit (default config i686) with GMP 4.3.0:

GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 94506ms
Step 2 took 2667ms
So not really any difference from MPIR 1.1.1 and GMP 4.3.0 which makes sense I guess because most of the new speeds were aimed at 64bit AMD and Core2 improvements.

Last fiddled with by Jeff Gilchrist on 2009-04-27 at 15:55
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-27, 15:59   #3
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default GMP-ECM Win64 Benchmarks

Looks like that assembler padding/re-alignment change you made did make a difference with the core2 speed.

These are both MSVC 64bit compiled binaries.

Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 42479ms
Step 2 took 3026ms
real    0m46.153s


GMP-ECM 6.2.3 [powered by GMP 4.2.1_MPIR_1.1.1] [ECM]
Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156
1 (85 digits)
Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817
Step 1 took 41730ms
Step 2 took 3057ms
real    0m44.862s
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-29, 09:56   #4
Yamato
 
Yamato's Avatar
 
Sep 2005
Berlin

2×3×11 Posts
Default

I have noticed, that the time for both stage 1+2 increases considerably for numbers > 2^640, if a binary with mulredc code is used:

Code:
// a binary with enabled mulredc:

> echo '2^640-305' | ./ecm 3e6
GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM]
Input number is 2^640-305 (193 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2697909633
Step 1 took 15128ms
Step 2 took 5973ms

> echo '2^640+115' | ./ecm 3e6
GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM]
Input number is 2^640+115 (193 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2510852341
Step 1 took 17077ms
Step 2 took 6324ms
2^640 ~ 5*10^192 seems to be also the crossover point for 'mulredc-' to 'non-mulredc-'binaries.
Yamato is offline   Reply With Quote
Old 2009-04-30, 18:49   #5
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11000111011112 Posts
Default Small logging complaint

I'm currently running a p+1 job on a C7252 with b1=1e8, default b2, -maxmem 1024, and -v.

It's taken 2653 minutes so far, and the last few thousand lines of logs have been, after a few at the top

Code:
Using lmax = 65536 with two pass NTT which takes about 875MB of memory
Using B1=100000000, B2=3951153670810, polynomial x^1, x0=823969581
P = 59053995, l = 65536, s_1 = 32076, k = s_2 = 640, m_1 = -3
Step 1 took 52295356ms
Computing F from factored S_1 took 94586ms
Computing h_x and h_y took 54619ms
Computing DCT-I of h_x took 4277ms
Computing DCT-I of h_y took 4316ms
Computing g_x took 102942ms
Computing g_x*h_x took 30038ms
Computing g_y took 102943ms
Computing g_y*h_y took 9740ms
Computing gcd of coefficients and N took 28050ms
endless repetitions of the form

Code:
Computing g_x took 103178ms
Computing g_x*h_x took 30050ms
Computing g_y took 103055ms
Computing g_y*h_y took 9736ms
Computing gcd of coefficients and N took 28046ms
Is there any way that you could add some indication of how many of these lines there are going to be, and maybe even an ETA?

Last fiddled with by fivemack on 2009-04-30 at 18:49
fivemack is offline   Reply With Quote
Old 2009-04-30, 18:59   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·5·587 Posts
Default

just look at the k value
on that one it is 640 which is the largest i have every seen
henryzz is offline   Reply With Quote
Old 2009-04-30, 22:22   #7
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

246710 Posts
Default

Adding a running counter to the output is trivial: look for the loop

for (l = 0; l < params->s_2; l++)

in pm1fs2.c (it occurs in 4 functions: P-1 and P+1, NTT or non. You're running P+1 NTT) and add some output per iteration. (Edit: outputf (OUTPUT_VERBOSE, "bla"); outputs only with -v parameter.) I'll add something to the SVN code.

A k value that high cries out for more memory... with 4GB, k would be only 40.

Alex

Last fiddled with by akruppa on 2009-04-30 at 22:45 Reason: forgot to write in which file... stupid
akruppa is offline   Reply With Quote
Old 2009-04-30, 22:51   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·491 Posts
Default

I realise that k=640 is generally a sign that you've got the parameter choice wrong; I was running a p+1 job with -maxmem 6144 on another CPU and have only 8G on this machine, and hadn't realised that -maxmem {N/M} took M^2 times as long ... this is why an ETA would be nice.

(I'm at the 441st of presumably 640 GCDs, so I'll leave it another 24 hours and it'll be done)
fivemack is offline   Reply With Quote
Old 2009-04-30, 22:55   #9
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

C3A16 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
The configure system changed since 6.2.2 where it used to detect my core2/penryn system as a pentium3-pc-cygwin by default it now detects it as i686-pc-cygwin.
My old pentium4 prescott is also detected as i686-pc-mingw32 by GMP-ECM 6.2.3, and with build=pentium4-pc-mingw32 it reports it as i786-pc-mingw32, while GMP 4.3.0 identifies it correctly as a pentium4.

Btw, is there any plans on raising max B1 on P+1 above 2^32-1 ? or is this a huge job?

Last fiddled with by ATH on 2009-04-30 at 23:06
ATH is offline   Reply With Quote
Old 2009-05-01, 06:23   #10
10metreh
 
10metreh's Avatar
 
Nov 2008

2×33×43 Posts
Default

I notice Alex's avatar has changed back. Remind me, did it always have a red shirt?
10metreh is offline   Reply With Quote
Old 2009-05-01, 09:43   #11
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2×17×73 Posts
Default Benchmarks on a P4 (3.4 GHz)

ECM-6.2 with GMP-4.2.2, built with ./configure --with-gmp=/usr/local

Code:
GMP-ECM 6.2 [powered by GMP 4.2.2] [ECM]
Input number is 18816141541139222511309815439534127651955651212035224345737809063431028813620968356115158131612051597 (101 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=180303176
Step 1 took 28469ms
Step 2 took 12938ms
Run 2 out of 3:
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=580726752
Step 1 took 29375ms
Step 2 took 13656ms
Run 3 out of 3:
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3060674203
Step 1 took 28969ms
Step 2 took 13765ms
ECM-6.2.3 with GMP-4.3.0; ./configure --with-gmp=/usr/local --enable-asm-redc --build=pentium4

Code:
GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM]
Input number is 18816141541139222511309815439534127651955651212035224345737809063431028813620968356115158131612051597 (101 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2628599422
Step 1 took 23078ms
Step 2 took 10938ms
Run 2 out of 3:
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2541381111
Step 1 took 23453ms
Step 2 took 10922ms
Run 3 out of 3:
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1407409785
Step 1 took 23641ms
Step 2 took 10718ms
Without --build=pentium4, the cpu is identified as i686, and the runs are MUCH slower than the ECM-6.2/GMP-4.2.2 run above; --build=pentium4 identifies it as i786 and gives a nice speedup, compared with ECM-6.2/GMP-4.2.2.

P-1:

Code:
GMP-ECM 6.2 [powered by GMP 4.2.2] [P-1]
Input number is 18816141541139222511309815439534127651955651212035224345737809063431028813620968356115158131612051597 (101 digits)
Using B1=10000000, B2=117875629818, polynomial x^1, x0=1103425920
Step 1 took 8094ms
Step 2 took 9156ms
Code:
GMP-ECM 6.2.3 [powered by GMP 4.3.0] [P-1]
Input number is 18816141541139222511309815439534127651955651212035224345737809063431028813620968356115158131612051597 (101 digits)
Using B1=10000000, B2=117875629818, polynomial x^1, x0=3580459987
Step 1 took 7766ms
Step 2 took 9312ms
Stage 1 is a bit faster, stage 2 a bit slower, overall the speed is almost equal.
Andi47 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Benchmarks Pjetrode Information & Answers 3 2018-01-07 23:23
RPS benchmarks pinhodecarlos Riesel Prime Search 29 2014-12-07 07:13
Benchmarks ET_ Operazione Doppi Mersennes 18 2013-04-24 06:38
Where are the Benchmarks Sandman192 Homework Help 17 2012-04-05 19:03
Benchmarks Vandy Hardware 6 2002-10-28 13:45

All times are UTC. The time now is 23:59.

Mon May 10 23:59:33 UTC 2021 up 32 days, 18:40, 2 users, load averages: 1.83, 1.93, 2.16

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.