![]() |
|
|
#1 |
|
Jul 2003
So Cal
2·34·13 Posts |
Hi,
Brian Gladman has been working on porting GMP to Windows x64. He has produced a working GMP-ECM Win x64 binary, but he's run into a problem and we're hoping someone here can help. For step 1, the Win x64 binary runs at the same speed as a Linux AMD64 binary, which puts it at a bit over 3x the speed of a 32-bit Athlon-optimized binary on the same computer. However, in step 2, the Win x64 version is about 50% slower than the Linux AMD64 binary. It seems that some critical GMP function(s) used heavily in step 2 are not being optimized as well by the Windows compiler. To track this down, do you have any likely suspects concerning what GMP operations are heavily used in step 2 but not step 1? Thanks, Greg |
|
|
|
|
|
#2 |
|
Oct 2004
tropical Massachusetts
6910 Posts |
Have you tried running ECM with the -v parameter? It won't give you a really low-level operation count, but it will give you timings for the different parts of Stage 2. It'd be interesting to see if the slowdown is general to all of Stage 2 or if it's restricted to a particular portion.
If you need more fine-grained performance indicators I suppose the thing to do is try to use a profiler. |
|
|
|
|
|
#3 |
|
Jul 2003
So Cal
210610 Posts |
Thanks for the tip! It looks like step 2 is overall slower. Here are the results:
Linux AMD64 GMP-ECM 6.0.1 [powered by GMP 4.1.4] [ECM] Input number is 193...823 (321 digits) Using MODMULN Using B1=1000000, B2=839549779, polynomial Dickson(3), sigma=10 Step 1 took 20506ms B2'=974637522 k=2 b2=487687200 d=66990 d2=13 dF=6720, i0=2 Initializing tables of differences for F took 14ms Computing roots of F took 292ms Building F from its roots took 1270ms Computing 1/F took 1018ms Initializing table of differences for G took 15ms Computing roots of G took 270ms Building G from its roots took 1251ms Computing roots of G took 267ms Building G from its roots took 1256ms Computing G * H took 672ms Reducing G * H mod F took 959ms Computing polyeval(F,G) took 3106ms Step 2 took 10446ms Win x64 GMP-ECM 6.0.1 (VC8-x64) [powered by GMP 4.1.4 (VC8-x64)] [ECM] Input number is 193...823 (321 digits) Using MODMULN Using B1=1000000, B2=839549779, polynomial Dickson(3), sigma=10 Step 1 took 19735ms B2'=974637522 k=2 b2=487687200 d=66990 d2=13 dF=6720, i0=2 Initializing tables of differences for F took 15ms Computing roots of F took 310ms Building F from its roots took 1525ms Computing 1/F took 1901ms Initializing table of differences for G took 15ms Computing roots of G took 288ms Building G from its roots took 1497ms Computing roots of G took 288ms Building G from its roots took 1494ms Computing G * H took 809ms Reducing G * H mod F took 1588ms Computing polyeval(F,G) took 5621ms Step 2 took 15429ms As you can see, the Win x64 version was slightly faster in step 1, but was uniformly slower in step 2. Particularly bad, though, were the steps Computing 1/F took 1018ms 1901ms Reducing G * H mod F took 959ms 1588ms Computing polyeval(F,G) took 3106ms 5621ms where I've listed the Linux time followed by the x64 time. Not just to figure out what GMP instructions are most used in those steps... Greg |
|
|
|
|
|
#4 |
|
Jul 2003
So Cal
40728 Posts |
OK...I tried gprof under linux. I did step 1 separately and created a save file. I then ran only step 2 using the binary with profiling enabled and here's where it says that the most time in step 2 is spent:
Code:
% cumulative self self total time seconds seconds calls s/call s/call name 30.92 15.79 15.79 8674914 0.00 0.00 ecm_redc_basecase 11.75 21.79 6.00 77840 0.00 0.00 __ecm_kronecker_schonhage 6.32 25.02 3.23 86953 0.00 0.00 __ecm_toomcook3 6.11 28.14 3.12 73664 0.00 0.00 TToomCookMul 6.09 31.25 3.11 9139 0.00 0.00 addWnm 5.70 34.16 2.91 78 0.04 0.04 __ecm_TMulKS 4.86 36.64 2.48 8554711 0.00 0.00 __ecm_mpres_mul 4.84 39.11 2.47 10012153 0.00 0.00 __ecm_mpres_sub 3.72 41.01 1.90 9754754 0.00 0.00 __ecm_mpres_is_zero 3.56 42.83 1.82 5767295 0.00 0.00 __ecm_list_add 2.94 44.33 1.50 1491136 0.00 0.00 TKarMul 2.80 45.76 1.43 77739 0.00 0.00 __ecm_toomcook4 Thanks for any help! Greg |
|
|
|
|
|
#5 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
The profiler data looks a bit strange... redc_basecase() should only be called from the root generation functions and those don't take nearly 30% of the time. Most time of stage 2 is spent multiplying polynomials by the Kronecker-Schönhage algorithm. I know little about it, dave_dm (David Newman) and Paul Zimmermann wrote that. I'll ask on the developers mailing list.
Alex |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is it worth the trouble to "upgrade" Windows 8 to Windows 7? | ixfd64 | Lounge | 23 | 2013-04-13 11:12 |
| Windows 7 Windows Update & Prime95 issue!!! | Unregistered | Information & Answers | 14 | 2010-04-10 21:47 |
| New PC - x64 Windows | Prime95 | Software | 9 | 2005-09-24 03:56 |
| PHP and windows and GMP | ltd | Prime Sierpinski Project | 1 | 2005-07-12 21:44 |
| Windows XP 64? | jebeagles | Software | 7 | 2004-08-17 11:06 |