mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2005-09-26, 23:30   #1
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·34·13 Posts
Default GMP-ECM for Windows x64

Hi,

Brian Gladman has been working on porting GMP to Windows x64. He has produced a working GMP-ECM Win x64 binary, but he's run into a problem and we're hoping someone here can help. For step 1, the Win x64 binary runs at the same speed as a Linux AMD64 binary, which puts it at a bit over 3x the speed of a 32-bit Athlon-optimized binary on the same computer. However, in step 2, the Win x64 version is about 50% slower than the Linux AMD64 binary. It seems that some critical GMP function(s) used heavily in step 2 are not being optimized as well by the Windows compiler. To track this down, do you have any likely suspects concerning what GMP operations are heavily used in step 2 but not step 1?

Thanks,
Greg
frmky is offline   Reply With Quote
Old 2005-09-28, 06:32   #2
trilliwig
 
trilliwig's Avatar
 
Oct 2004
tropical Massachusetts

6910 Posts
Default

Have you tried running ECM with the -v parameter? It won't give you a really low-level operation count, but it will give you timings for the different parts of Stage 2. It'd be interesting to see if the slowdown is general to all of Stage 2 or if it's restricted to a particular portion.

If you need more fine-grained performance indicators I suppose the thing to do is try to use a profiler.
trilliwig is offline   Reply With Quote
Old 2005-09-29, 04:28   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

210610 Posts
Default

Thanks for the tip! It looks like step 2 is overall slower. Here are the results:

Linux AMD64
GMP-ECM 6.0.1 [powered by GMP 4.1.4] [ECM]
Input number is 193...823 (321 digits)
Using MODMULN
Using B1=1000000, B2=839549779, polynomial Dickson(3), sigma=10
Step 1 took 20506ms
B2'=974637522 k=2 b2=487687200 d=66990 d2=13 dF=6720, i0=2
Initializing tables of differences for F took 14ms
Computing roots of F took 292ms
Building F from its roots took 1270ms
Computing 1/F took 1018ms
Initializing table of differences for G took 15ms
Computing roots of G took 270ms
Building G from its roots took 1251ms
Computing roots of G took 267ms
Building G from its roots took 1256ms
Computing G * H took 672ms
Reducing G * H mod F took 959ms
Computing polyeval(F,G) took 3106ms
Step 2 took 10446ms

Win x64
GMP-ECM 6.0.1 (VC8-x64) [powered by GMP 4.1.4 (VC8-x64)] [ECM]
Input number is 193...823 (321 digits)
Using MODMULN
Using B1=1000000, B2=839549779, polynomial Dickson(3), sigma=10
Step 1 took 19735ms
B2'=974637522 k=2 b2=487687200 d=66990 d2=13 dF=6720, i0=2
Initializing tables of differences for F took 15ms
Computing roots of F took 310ms
Building F from its roots took 1525ms
Computing 1/F took 1901ms
Initializing table of differences for G took 15ms
Computing roots of G took 288ms
Building G from its roots took 1497ms
Computing roots of G took 288ms
Building G from its roots took 1494ms
Computing G * H took 809ms
Reducing G * H mod F took 1588ms
Computing polyeval(F,G) took 5621ms
Step 2 took 15429ms

As you can see, the Win x64 version was slightly faster in step 1, but was uniformly slower in step 2. Particularly bad, though, were the steps

Computing 1/F took 1018ms 1901ms
Reducing G * H mod F took 959ms 1588ms
Computing polyeval(F,G) took 3106ms 5621ms

where I've listed the Linux time followed by the x64 time. Not just to figure out what GMP instructions are most used in those steps...

Greg
frmky is offline   Reply With Quote
Old 2005-09-29, 05:01   #4
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40728 Posts
Default

OK...I tried gprof under linux. I did step 1 separately and created a save file. I then ran only step 2 using the binary with profiling enabled and here's where it says that the most time in step 2 is spent:

Code:
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
30.92     15.79    15.79  8674914     0.00     0.00  ecm_redc_basecase
 11.75     21.79     6.00    77840     0.00     0.00  __ecm_kronecker_schonhage
  6.32     25.02     3.23    86953     0.00     0.00  __ecm_toomcook3
  6.11     28.14     3.12    73664     0.00     0.00  TToomCookMul
  6.09     31.25     3.11     9139     0.00     0.00  addWnm
  5.70     34.16     2.91       78     0.04     0.04  __ecm_TMulKS
  4.86     36.64     2.48  8554711     0.00     0.00  __ecm_mpres_mul
  4.84     39.11     2.47 10012153     0.00     0.00  __ecm_mpres_sub
  3.72     41.01     1.90  9754754     0.00     0.00  __ecm_mpres_is_zero
  3.56     42.83     1.82  5767295     0.00     0.00  __ecm_list_add
  2.94     44.33     1.50  1491136     0.00     0.00  TKarMul
  2.80     45.76     1.43    77739     0.00     0.00  __ecm_toomcook4
Unfortunately, I don't have the tools to do this in Windows x64, so I can't directly compare. Any ideas where to look?

Thanks for any help!
Greg
frmky is offline   Reply With Quote
Old 2005-09-30, 18:42   #5
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

2,467 Posts
Default

The profiler data looks a bit strange... redc_basecase() should only be called from the root generation functions and those don't take nearly 30% of the time. Most time of stage 2 is spent multiplying polynomials by the Kronecker-Schönhage algorithm. I know little about it, dave_dm (David Newman) and Paul Zimmermann wrote that. I'll ask on the developers mailing list.

Alex
akruppa is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it worth the trouble to "upgrade" Windows 8 to Windows 7? ixfd64 Lounge 23 2013-04-13 11:12
Windows 7 Windows Update & Prime95 issue!!! Unregistered Information & Answers 14 2010-04-10 21:47
New PC - x64 Windows Prime95 Software 9 2005-09-24 03:56
PHP and windows and GMP ltd Prime Sierpinski Project 1 2005-07-12 21:44
Windows XP 64? jebeagles Software 7 2004-08-17 11:06

All times are UTC. The time now is 12:31.


Fri Jul 16 12:31:58 UTC 2021 up 49 days, 10:19, 2 users, load averages: 1.38, 1.23, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.