mersenneforum.org Links to Precompiled GMP-ECM versions
 Register FAQ Search Today's Posts Mark Forums Read

 2009-04-07, 11:27 #155 smh     "Sander" Oct 2002 52.345322,5.52471 29·41 Posts I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85 Code: GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786 Step 1 took 8673ms Step 2 took 7332ms I'm new to linux, but i've finally managed to compile my own ecm and am surprised that it's significantly faster even though it's running in VMWare Code: GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1798745233 Step 1 took 7872ms Step 2 took 4660ms
2009-04-07, 14:59   #156
Jeff Gilchrist

Jun 2003

3·17·23 Posts

Quote:
 Originally Posted by smh I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85 Code: GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786 Step 1 took 8673ms Step 2 took 7332ms
What processor do you have and what speed is it? I noticed your B2 value is much lower than mine (my B2=3178599824416) and you used a different sigma for your test (my sigma=3509569131). Were you just lowering that to reduce the time it takes to do the test?

The Windows MSVC code uses a different set of assembler than the Linux code so it doesn't surprise me that the timing is different. If you choose the same sigma for both your Windows and Linux tests, and choose a larger B2 value so the test runs a little longer do you still see the huge difference? Try running each test twice just to make sure the numbers are similar in case your system decided to do something during the test and artificially slowed down the benchmark for one.

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-07 at 14:59

 2009-04-07, 15:26 #157 smh     "Sander" Oct 2002 52.345322,5.52471 22458 Posts I see that you used a B1=300M, i used 3M. I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10).
 2009-04-07, 15:32 #158 Yamato     Sep 2005 Berlin 10000102 Posts @smh: Could you please post this binary and/or compare it with my 64-bit binary? I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones.
2009-04-07, 17:01   #159
Jeff Gilchrist

Jun 2003

3·17·23 Posts

Quote:
 Originally Posted by smh I see that you used a B1=300M, i used 3M.
Ah, that would explain the difference.

Quote:
 Originally Posted by smh I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10).
I realize that, I'm just trying to figure out why there is such a big difference. If you had an AMD processor then I could see how the Core2 version would be slower than the Linux version (which would have detected an AMD processor if you compiled it yourself).

As I said before, Brian Gladman had to translate the assembler from the syntax used by GCC to the one that YASM (used in the MSVC) build understands. I think he said that some of the code in the linux source is still newer than what he has translated. Since I'm not familiar with the code, I'm not sure why there is such a big difference.

Jeff.

2009-04-07, 18:54   #160
smh

"Sander"
Oct 2002
52.345322,5.52471

29·41 Posts

Quote:
 Originally Posted by Yamato @smh: Could you please post this binary and/or compare it with my 64-bit binary? I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones.
With B1 <= 1M, there is to much variation to see. With larger B1 yours is consistantly faster in step 2, mine most of the time in step one.

I did limited testing, but with larger composites yours might also be faster in step 1.

Notice i used GMP-ECM 6.2.2 and GMP 4.2.4 (with the core2 patch), so it might be apples and oranges.

With B1=3M
Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=959787799
Step 1 took 8008ms
Step 2 took 4496ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1211299266
Step 1 took 7865ms
Step 2 took 4328ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=573230298
Step 1 took 7989ms
Step 2 took 4340ms

Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=937001321
Step 1 took 7808ms
Step 2 took 4500ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1410435444
Step 1 took 7773ms
Step 2 took 4500ms
Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=3426145601
Step 1 took 7921ms
Step 2 took 4500ms
With B1=11M
Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1064336844
Step 1 took 29329ms
Step 2 took 14061ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3355605506
Step 1 took 28858ms
Step 2 took 14157ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=191990272
Step 1 took 29342ms
Step 2 took 14181ms

Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1387859769
Step 1 took 28389ms
Step 2 took 14777ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=4281716356
Step 1 took 27850ms
Step 2 took 14685ms
Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3779197836
Step 1 took 27638ms
Step 2 took 14681ms

 2009-04-10, 13:51 #161 Jeff Gilchrist     Jun 2003 Ottawa, Canada 3·17·23 Posts I took ECM 6.2.2 and compiled it with MPIR 1.0 in cygwin to compare the LINUX code to what Windows MSVC code is doing. I saw a similar pattern to all of you as well. This is all 32bit code run on an Intel Core2 Q9550 @ 3.4GHz. ECM Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 Sigma: 980060817 MSVC 6.2.2 with new SSE2: Step 1 took 82837ms | Step 1 took 82790ms Step 2 took 41137ms | Step 2 took 41402ms MSVC 6.2.2 without SSE2: Step 1 took 82867ms | Step 1 took 83071ms Step 2 took 42557ms | Step 2 took 43337ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 78359ms | Step 1 took 78531ms Step 2 took 34695ms | Step 2 took 34086ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 78375ms | Step 1 took 78718ms Step 2 took 24445ms | Step 2 took 24367ms P-1 Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 9469ms | Step 1 took 9563ms Step 2 took 7098ms | Step 2 took 7051ms MSVC 6.2.2 without SSE2: Step 1 took 9360ms | Step 1 took 9235ms Step 2 took 11731ms | Step 2 took 11404ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 8751ms | Step 1 took 8487ms Step 2 took 5788ms | Step 2 took 5740ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 8455ms | Step 1 took 8658ms Step 2 took 5788ms | Step 2 took 5710ms P+1 Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 17082ms | Step 1 took 17145ms Step 2 took 8596ms | Step 2 took 8408ms MSVC 6.2.2 without SSE2: Step 1 took 17675ms | Step 1 took 17566ms Step 2 took 15585ms | Step 2 took 15553ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 14570ms | Step 1 took 14617ms Step 2 took 7566ms | Step 2 took 7816ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 14929ms | Step 1 took 14602ms Step 2 took 7706ms | Step 2 took 7862ms You can see that the new MSVC build that uses SSE2 is much faster in Stage 2 than the old build, but the Linux code built with gcc (in cygwin on Windows or whatever) is faster in both Stage1 and Stage2. So if you want the fastest possible ECM/P-1/P+1 you could install cygwin/mingw or run Linux/Linux in VM Jeff. Last fiddled with by Jeff Gilchrist on 2009-04-10 at 14:49
2009-04-10, 14:38   #162
akruppa

"Nancy"
Aug 2002
Alexandria

1001101000112 Posts

Quote:
 GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 78359ms | Step 1 took 78531ms Step 2 took 34695ms | Step 2 took 34086ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 78375ms | Step 1 took 78718ms Step 2 took 24445ms | Step 2 took 24367ms
This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases?
Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests?

Alex

2009-04-10, 14:50   #163
Jeff Gilchrist

Jun 2003

3·17·23 Posts

Quote:
 Originally Posted by akruppa This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases? Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests?
For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing.

Both config.h files contain #define HAVE_SSE2 1

Both linked the mulredc files from pentium4/

Jeff.

Last fiddled with by Jeff Gilchrist on 2009-04-10 at 15:15

2009-04-10, 15:31   #164
rogue

"Mark"
Apr 2003
Between here and the

142428 Posts

Quote:
 Originally Posted by Jeff Gilchrist For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing.
Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.

2009-04-10, 16:11   #165
Jeff Gilchrist

Jun 2003

3×17×23 Posts

Quote:
 Originally Posted by rogue Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.
GMP-ECM thinks its a P3. MPIR has core2/nocona specifc code which was used when building that.

Jeff.

 Similar Threads Thread Thread Starter Forum Replies Last Post masser Sierpinski/Riesel Base 5 25 2011-11-26 09:21 wblipp Msieve 0 2011-07-17 20:59 davieddy Information & Answers 9 2010-10-08 14:27 ET_ PrimeNet 0 2008-01-26 09:35 Xyzzy Forum Feedback 2 2007-03-18 02:17

All times are UTC. The time now is 06:20.

Wed May 12 06:20:42 UTC 2021 up 34 days, 1:01, 0 users, load averages: 1.92, 1.74, 1.72