mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2009-04-07, 11:27   #155
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29·41 Posts
Default

I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85
Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786
Step 1 took 8673ms
Step 2 took 7332ms
I'm new to linux, but i've finally managed to compile my own ecm and am surprised that it's significantly faster even though it's running in VMWare
Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1798745233
Step 1 took 7872ms
Step 2 took 4660ms
smh is offline   Reply With Quote
Old 2009-04-07, 14:59   #156
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by smh View Post
I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85

Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786
Step 1 took 8673ms
Step 2 took 7332ms
What processor do you have and what speed is it? I noticed your B2 value is much lower than mine (my B2=3178599824416) and you used a different sigma for your test (my sigma=3509569131). Were you just lowering that to reduce the time it takes to do the test?

The Windows MSVC code uses a different set of assembler than the Linux code so it doesn't surprise me that the timing is different. If you choose the same sigma for both your Windows and Linux tests, and choose a larger B2 value so the test runs a little longer do you still see the huge difference? Try running each test twice just to make sure the numbers are similar in case your system decided to do something during the test and artificially slowed down the benchmark for one.

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-07 at 14:59
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-07, 15:26   #157
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

22458 Posts
Default

I see that you used a B1=300M, i used 3M.

I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10).
smh is offline   Reply With Quote
Old 2009-04-07, 15:32   #158
Yamato
 
Yamato's Avatar
 
Sep 2005
Berlin

10000102 Posts
Default

@smh:
Could you please post this binary and/or compare it with my 64-bit binary?

I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones.
Yamato is offline   Reply With Quote
Old 2009-04-07, 17:01   #159
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by smh View Post
I see that you used a B1=300M, i used 3M.
Ah, that would explain the difference.

Quote:
Originally Posted by smh View Post
I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10).
I realize that, I'm just trying to figure out why there is such a big difference. If you had an AMD processor then I could see how the Core2 version would be slower than the Linux version (which would have detected an AMD processor if you compiled it yourself).

As I said before, Brian Gladman had to translate the assembler from the syntax used by GCC to the one that YASM (used in the MSVC) build understands. I think he said that some of the code in the linux source is still newer than what he has translated. Since I'm not familiar with the code, I'm not sure why there is such a big difference.

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-07, 18:54   #160
smh
 
smh's Avatar
 
"Sander"
Oct 2002
52.345322,5.52471

29·41 Posts
Default

Quote:
Originally Posted by Yamato View Post
@smh:
Could you please post this binary and/or compare it with my 64-bit binary?

I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones.
With B1 <= 1M, there is to much variation to see. With larger B1 yours is consistantly faster in step 2, mine most of the time in step one.

I did limited testing, but with larger composites yours might also be faster in step 1.

Notice i used GMP-ECM 6.2.2 and GMP 4.2.4 (with the core2 patch), so it might be apples and oranges.

With B1=3M
Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=959787799
Step 1 took 8008ms
Step 2 took 4496ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1211299266
Step 1 took 7865ms
Step 2 took 4328ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=573230298
Step 1 took 7989ms
Step 2 took 4340ms

GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=937001321
Step 1 took 7808ms
Step 2 took 4500ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1410435444
Step 1 took 7773ms
Step 2 took 4500ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=3426145601
Step 1 took 7921ms
Step 2 took 4500ms
With B1=11M
Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1064336844
Step 1 took 29329ms
Step 2 took 14061ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3355605506
Step 1 took 28858ms
Step 2 took 14157ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=191990272
Step 1 took 29342ms
Step 2 took 14181ms

GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1387859769
Step 1 took 28389ms
Step 2 took 14777ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=4281716356
Step 1 took 27850ms
Step 2 took 14685ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3779197836
Step 1 took 27638ms
Step 2 took 14681ms
smh is offline   Reply With Quote
Old 2009-04-10, 13:51   #161
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

I took ECM 6.2.2 and compiled it with MPIR 1.0 in cygwin to compare the LINUX code to what Windows MSVC code is doing. I saw a similar pattern to all of you as well. This is all 32bit code run on an Intel Core2 Q9550 @ 3.4GHz.

ECM
Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561
B1=20000000
Sigma: 980060817

MSVC 6.2.2 with new SSE2:
Step 1 took 82837ms | Step 1 took 82790ms
Step 2 took 41137ms | Step 2 took 41402ms

MSVC 6.2.2 without SSE2:
Step 1 took 82867ms | Step 1 took 83071ms
Step 2 took 42557ms | Step 2 took 43337ms

GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3
Step 1 took 78359ms | Step 1 took 78531ms
Step 2 took 34695ms | Step 2 took 34086ms

GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin)
Step 1 took 78375ms | Step 1 took 78718ms
Step 2 took 24445ms | Step 2 took 24367ms

P-1
Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561
B1=20000000
x0: 524328229

MSVC 6.2.2 with new SSE2:
Step 1 took 9469ms | Step 1 took 9563ms
Step 2 took 7098ms | Step 2 took 7051ms

MSVC 6.2.2 without SSE2:
Step 1 took 9360ms | Step 1 took 9235ms
Step 2 took 11731ms | Step 2 took 11404ms

GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3
Step 1 took 8751ms | Step 1 took 8487ms
Step 2 took 5788ms | Step 2 took 5740ms

GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin)
Step 1 took 8455ms | Step 1 took 8658ms
Step 2 took 5788ms | Step 2 took 5710ms

P+1
Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561
B1=20000000
x0: 524328229

MSVC 6.2.2 with new SSE2:
Step 1 took 17082ms | Step 1 took 17145ms
Step 2 took 8596ms | Step 2 took 8408ms

MSVC 6.2.2 without SSE2:
Step 1 took 17675ms | Step 1 took 17566ms
Step 2 took 15585ms | Step 2 took 15553ms

GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3
Step 1 took 14570ms | Step 1 took 14617ms
Step 2 took 7566ms | Step 2 took 7816ms

GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin)
Step 1 took 14929ms | Step 1 took 14602ms
Step 2 took 7706ms | Step 2 took 7862ms

You can see that the new MSVC build that uses SSE2 is much faster in Stage 2 than the old build, but the Linux code built with gcc (in cygwin on Windows or whatever) is faster in both Stage1 and Stage2. So if you want the fastest possible ECM/P-1/P+1 you could install cygwin/mingw or run Linux/Linux in VM

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-10 at 14:49
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-10, 14:38   #162
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

1001101000112 Posts
Default

Quote:
GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3
Step 1 took 78359ms | Step 1 took 78531ms
Step 2 took 34695ms | Step 2 took 34086ms

GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin)
Step 1 took 78375ms | Step 1 took 78718ms
Step 2 took 24445ms | Step 2 took 24367ms
This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases?
Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests?

Alex
akruppa is offline   Reply With Quote
Old 2009-04-10, 14:50   #163
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

Quote:
Originally Posted by akruppa View Post
This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases?
Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests?
For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing.

Both config.h files contain #define HAVE_SSE2 1

Both linked the mulredc files from pentium4/

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-10 at 15:15
Jeff Gilchrist is offline   Reply With Quote
Old 2009-04-10, 15:31   #164
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

142428 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing.
Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.
rogue is offline   Reply With Quote
Old 2009-04-10, 16:11   #165
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Quote:
Originally Posted by rogue View Post
Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.
GMP-ECM thinks its a P3. MPIR has core2/nocona specifc code which was used when building that.

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Project Links masser Sierpinski/Riesel Base 5 25 2011-11-26 09:21
Links to Precompiled Msieve versions wblipp Msieve 0 2011-07-17 20:59
Links davieddy Information & Answers 9 2010-10-08 14:27
Links question ET_ PrimeNet 0 2008-01-26 09:35
Links. Xyzzy Forum Feedback 2 2007-03-18 02:17

All times are UTC. The time now is 06:20.

Wed May 12 06:20:42 UTC 2021 up 34 days, 1:01, 0 users, load averages: 1.92, 1.74, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.