20040921, 14:07  #34  
Mar 2003
New Zealand
13·89 Posts 
Quote:
45 digits: B1=11e6, B2=4e9, reported as 1.33 standard curves. 50 digits: B1=44e6, B2=26e9, reported as 1.5 standard curves. (I have rounded the reporting rate down to a nice integer ratio so I believe that I am being a little conservative in the work I report). These are based on an average cofactor size of about 200 digits, doing stage one on a Celeron and stage two by running two instances of gmpecm 5.0.3 on a hyperthreaded P4. I wouldn't be surprised if the optimal B2 was quite a bit smaller when mprime can combine four numbers into a single stage one step, especially with a larger cofactor size, but in any case I think it would be worthwhile to spend at least as long doing stage two as mprime does. At the moment I am working on numbers from the 2+ tables where I don't get the free stage one for the associated 2 mate, so I use these 45 digit curves: Medium (c200) cofactor: B1=11e6, B2=8e9, reported as 1.5 standard curves. Small (c160) cofactor: B1=11e6, B2=11e9, reported as 1.66 standard curves. In most cases the stage two time works out at 6080% of the stage one time. (after dividing the stage one time by the number of cofactors it splits into for stage two). This is similar to what I would get if I ran both stages of the curve using gmpecm with default settings. 

20040921, 15:13  #35 
Aug 2002
Termonfeckin, IE
2^{4}×173 Posts 
Thanks Geoff. Your post really helps. Just what I was looking for. For all practical purposes, all one needs to know is the optimal B2 (optimizing against expected time per factor found) and then how many curves to report it as.
Getting into the details requires reading the Wagstaff/Silverman paper and parts of a dissertation which I will do this weekend. But that is mostly academic. Putting on an engineering hat, I wouldn't save that much more time by going into the details and computing the numbers of curves required for a given B2 with a finer precision than we already have. 
20040921, 16:06  #36  
Mar 2003
New Zealand
13·89 Posts 
Quote:
I have recently tried the ksmultiply patch with gmpecm 5.0.3 from this thread http://www.mersenneforum.org/showthread.php?t=2923, which I gather will be in the new gmpecm when it is released. The stage two improvement is quite impressive, especially with large B2, but the memory usage is much higher which will have a big effect on the strategy for choosing B2. Here are a few sample curve times (Machine: P4 Northwood at 2.9GHz, OS: Debian Linux, SMP kernel 2.6, libgmp: gmp 4.1.3 configured for P4 architecture.) Code:
unpatched patched size B2 range polynomial memory time memory time improvement  c151 44e6180e9 dickson 30 150MB 706s 520MB 479s 47% 44e644e9 dickson 30 74MB 275s 258MB 212s 30% c163 11e626e9 dickson 12 59MB 190s 212MB 146s 30% 11e611e9 dickson 12 38MB 108s 131MB 89s 21% c213 11e626e9 dickson 12 74MB 247s 272MB 196s 26% 11e611e9 dickson 12 48MB 143s 167MB 121s 18% 

20040921, 17:10  #37 
Aug 2002
Termonfeckin, IE
2^{4}×173 Posts 
Thanks geoff for the pointer to the patch. However, it does not work as it complains about the BITS_PER_MP_LIMB constant. Also, the file gmpimpl.h is referred to but is not included in the patch.
Here is what I get ksmultiply.c:22:23: gmpimpl.h: No such file or directory ksmultiply.c: In function `kronecker_schonhage': ksmultiply.c:70: `BITS_PER_MP_LIMB' undeclared (first use in this function) ksmultiply.c:70: (Each undeclared identifier is reported only once ksmultiply.c:70: for each function it appears in.) 
20040921, 17:31  #38  
Mar 2003
New Zealand
13·89 Posts 
Quote:
(BTW the times in the previous post were for stage two only). 

20040922, 09:03  #39 
Aug 2002
Termonfeckin, IE
5320_{8} Posts 
I copied all the .h files from the gmp directory to the ecm directory and it compiled straight away. And boy it speeded up. Stage 2 with the gmpecm default of 25.7e9 went about 20% faster on my P4.
Alex, any comments on whether this patched version can be used? I'll stick with the regular till I hear from you. 
20040922, 09:31  #40 
"Nancy"
Aug 2002
Alexandria
2,467 Posts 
I didn't get involved much with the KS code, Dave and Paul did that. They made some changes to the code, but I think they were mostly speedups and reducing memory allocation. Perhaps Dave can comment?
Alex 
20040922, 11:36  #41 
May 2004
2^{4}×5 Posts 
gmpimpl.h wasn't included as it declares some cpudependent constants and macros. But apart from BITS_PER_MP_LIMB, I don't think which gmpimpl.h you use affects the ks code much. This file isn't needed in the current version of ks as Paul Z doesn't like including it  instead all of the relevant bits are copied into ecmgmp.h.
Wrt memory usage  the original ks memory allocation didn't work as it intended and ended up mallocing all over the place. This has been fixed. It still uses more than the old multiplication, maybe twice as much, maybe less, but certainly not the 34x factor Geoff's table shows. (I haven't done any analysis on the current code myself). Since the ks patch, the default B2 and the default k values obviously need to be updated  I think Paul Z is handling/has handled this. Dave 
20040922, 11:42  #42 
May 2004
2^{4}·5 Posts 
garo: the patched version can certainly be used, as long as you don't mind the excessive memory allocation. Run test.ecm to check it works.

20040922, 12:19  #43 
Aug 2002
Termonfeckin, IE
2^{4}·173 Posts 
Thanks. For the moment then, I'll just copy the BITS_PER_MP_LIMB variable to ecm.h and get rid of all the other gmp include files. And run test.ecm before using it.
PS: As you may have seen above, the default kvalue was not always the fastest on my P4 and my Athlon for several values of B2. 
20040923, 07:53  #44 
Aug 2002
Termonfeckin, IE
2768_{10} Posts 
So I compiled the version with the ksmultiplypatch and ran it on range of numbers. It seems to work fine and runs faster than plain GMPECM5.0.3 for higher B2 values.
However, I've run into a problem with a specific save file from mprime. I'm attaching it below, but when I run it with k=4 and B2=4e10 it does a segmentation fault at the stage of computing 1/F. The plain ECM5.0.3 does nto crash for the same values. With the same parameters and a different residue file it does not crash. With same residue and B2 bound and k=5 it does nto crash. And with same residue, k=4 and B2=4e9 it does not crash. Here is the residue file attached. I hope someone, particularly dave can have a look at it. Oh and here is the stackdump: Code:
Exception: STATUS_STACK_OVERFLOW at eip=00420023 eax=00148490 ebx=00000019 ecx=00032800 edx=00000000 esi=000D0D24 edi=000D0D0B ebp=0022E838 esp=0022E7F8 program=F:\cygwin\home\ecm\ecm.exe, pid 488, thread main cs=001B ds=0023 es=0023 fs=0038 gs=0000 ss=0023 Stack trace: Frame Function Args 0022E838 00420023 (14BB53CC, 141FAD18, 000D0D0B, 0FB71124) 0022E898 0040FDFA (0A129BC8, 1155D064, 0A051500, 00003DE0) 0022E8E8 0040537B (1152E9F0, 0A051500, 00007BBF, 0A129BC8) 0022E9C8 0040B40D (0022F000, 0022EA30, 0022EA60, 00000000) 0022EAD8 00403025 (0022F000, 0022F040, 0022F030, 0022F010) 0022F068 00406EAE (00000003, 61782CBC, 0A0500A8, 0022F0C0) 0022F0A8 61005F34 (0022F0C0, 77F56BB3, 77DD0000, 77F5BCD4) 0022FF88 6100614B (00000000, 00000000, 00000000, 00000000) End of stack trace Last fiddled with by garo on 20040923 at 08:01 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Nice progress!  schickel  FactorDB  29  20120718 17:03 
Nice pic  Dubslow  Forum Feedback  0  20120502 02:13 
Let's do another nice big GNFS job!  fivemack  Factoring  84  20110426 10:22 
M971 factored  philmoore  Lounge  5  20040914 20:18 
Nice link...  Xyzzy  Lounge  4  20030628 13:37 