![]() |
Sounds like a Pentium4 Northwood or Willamette, try:
GMP-ECM 6.2: Pentium 4 (Northwood) 2.4 Ghz: [URL="http://www.hoegge.dk/mersenne/ecm62-p4n.zip"]ecm62-p4n.zip[/URL] --enable-asm-redc:[URL="http://www.hoegge.dk/mersenne/ecm62-p4n-asmredc.zip"]ecm62-p4n-asmredc.zip[/URL] GMP-ECM 6.1.3 (don't have the newest for the Willamette) Pentium 4 (Willamette) 1.50 Ghz: [URL="http://www.hoegge.dk/mersenne/ecm613-gmp422-p4w.zip"]ecm613-gmp422-p4w.zip[/URL] [URL="http://www.hoegge.dk/mersenne/ecm613-gmp422-p4w-asmredc.zip"]ecm613-gmp422-p4w-asmredc.zip[/URL] |
The binaries disappear in a split-second when I open them. Is there need for a worktodo.ini?
|
[QUOTE=10metreh;149257]The binaries disappear in a split-second when I open them. Is there need for a worktodo.ini?[/QUOTE]
You should read up on the usage of GMP-ECM first. Put your input number in a file (e.g. "in.txt"), followed by a newline. Then open a console and type ecm -c 25 1e6 < in.txt This will instruct the program to run 25 curves with B1 = 1e6 from the input file. You can also use a batch file instead. The following binary should work also on P4 computers and is faster for small input numbers (say < 200 digits): [URL="http://www2.informatik.hu-berlin.de/~schoenbe/ecm621_win32_core2.zip"]ecm621_win32_core2.zip[/URL] |
[QUOTE=Yamato;149272]You should read up on the usage of GMP-ECM first. Put your input number in a file (e.g. "in.txt"), followed by a newline. Then open a console and type
ecm -c 25 1e6 < in.txt This will instruct the program to run 25 curves with B1 = 1e6 from the input file. You can also use a batch file instead.[/QUOTE] If you want to have a logfile (timings and factors will go into this file), then type: ecm -c 25 1e6 <in.txt >>outputfile.txt ">>outputfile.txt" with two ">" will append to the logfile (if it exists), ">outputfile.txt" (with one ">") will overwrite the logfile. |
GMP-ECM has no graphical interface, it is a command line program. You can download this file: [URL="http://www.hoegge.dk/upload/dosprompt.vbs"]dosprompt.vbs[/URL] and run it.
Then you can rightclick on the folder you unpacked GMP-ECM to and choose "Command Prompt here" to get a command window in that folder. Now you can use the line Yamato and Andi47 posted, or use: ecm --help to get a list of options. Also read the GMP-ECM readme file: [URL="http://www.hoegge.dk/mersenne/README.txt"]README.txt[/URL] |
When I type in Andi47's exact line, the program only runs one curve!
|
[QUOTE=10metreh;149334]When I type in Andi47's exact line, the program only runs one curve![/QUOTE]
Have you put a [B]newline[/B] at the end of the file? |
[quote=Yamato;149344]Have you put a [B]newline[/B] at the end of the file?[/quote]
could that bug(feature) be corrected |
[URL]http://www.mersenneforum.org/showthread.php?t=3922[/URL]
Look to this Thread ^^^ ;) |
GMP-ECM [B][I]6.2.1[/I][/B] with [I]GMP 4.2.4[/I] compiled on Windows XP/Vista(32bit) with Msys+MinGW:
Core2Duo (Conroe) E6750 2.66 Ghz: [URL="http://www.hoegge.dk/mersenne/ecm621-c2d.zip"]ecm621-c2d.zip[/URL](32bit) --enable-asm-redc: [URL="http://www.hoegge.dk/mersenne/ecm621-c2d-asmredc.zip"]ecm621-c2d-asmredc.zip[/URL](32bit) Mobile Core2Duo T7300 (Merom) 2.00 Ghz: [URL="http://www.hoegge.dk/mersenne/ecm621-mobc2d.zip"]ecm621-mobc2d.zip[/URL](32bit) --enable-asm-redc: [URL="http://www.hoegge.dk/mersenne/ecm621-mobc2d-asmredc.zip"]ecm621-mobc2d-asmredc.zip[/URL](32bit) Pentium 4 550 (Prescott) 3.4 Ghz: [URL="http://www.hoegge.dk/mersenne/ecm621-p4p.zip"]ecm621-p4p.zip[/URL] --enable-asm-redc: [URL="http://www.hoegge.dk/mersenne/ecm621-p4p-asmredc.zip"]ecm621-p4p-asmredc.zip[/URL] |
Thanks!
The p25 factor of 10^100+11 (my test) was found on only the second curve of the 20 digit level! |
GMP-ECM [B]6.2.1[/B] with [B]GMP 4.2.4[/B] compiled on Windows XP/Vista [B]64bit[/B] with Visual Studio 2008:
Core2: [url]http://gilchrist.ca/jeff/ecm/ecm621_win64_core2.zip[/url] |
Are there binaries for Intel-Mac, PPC-Mac and Sparc-Solaris?
yoyo |
[QUOTE=yoyo;155644]Are there binaries for Intel-Mac, PPC-Mac and Sparc-Solaris?
yoyo[/QUOTE] Building a binary for MacIntel or MacPPC is very easy. You should be able to do it without any problems. |
[QUOTE=rogue;155649]Building a binary for MacIntel or MacPPC is very easy. You should be able to do it without any problems.[/QUOTE]
I know, but I do not have a Mac. :( yoyo |
[quote=yoyo;155868]I know, but I do not have a Mac. :(
yoyo[/quote] Why do you need one? |
[QUOTE=10metreh;155889]Why do you need one?[/QUOTE]
I setup a distributed computing project, where I wrapped gmp-ecm into the Boinc world. Many users have Macs and want help. Therefore I need alsoMac versions of gmp-ecm. yoyo Boinc: [url]http://boinc.berkeley.edu/[/url] |
[QUOTE=yoyo;156320]I setup a distributed computing project, where I wrapped gmp-ecm into the Boinc world. Many users have Macs and want help. Therefore I need alsoMac versions of gmp-ecm.
yoyo Boinc: [url]http://boinc.berkeley.edu/[/url][/QUOTE] Building GMP-ECM isn't a problem on Macs, but linking with BOINC is. I've tried to link my software with the BOINC libraries with no success. Unfortunately the BOINC community hasn't been helpful in resolving those issues. |
I have decided to keep a web page of all the pre-compiled binaries I'm making for factoring programs, mostly 64bit Windows ones. You can find the latest version of GMP-ECM here:
[url]http://gilchrist.ca/jeff/factoring/[/url] |
[QUOTE=Jeff Gilchrist;158149]I have decided to keep a web page of all the pre-compiled binaries I'm making for factoring programs, mostly 64bit Windows ones. You can find the latest version of GMP-ECM here:
[url]http://gilchrist.ca/jeff/factoring/[/url][/QUOTE] super :tu: |
I have now updated my website to included new compiled versions of GMP-ECM with tweaked ecm-params.h settings making them slightly faster than before. There is also now 32bit Athlon and 64bit Opteron binaries along with the 32bit Core2 and 64bit Core2 ones.
Available: [url]http://gilchrist.ca/jeff/factoring/[/url] * Note - I don't have a 64bit Opteron system with 64bit Windows so I can't check to see how fast it is or if it works properly. Can anyone try it out and let met know? Jeff. |
I have updated my factoring binaries with the new 6.2.2 release:
[url]http://gilchrist.ca/jeff/factoring/[/url] There are Windows 32bit Intel and AMD versions, along with Windows 64bit Core2 and AMD64 versions. Speed is essentially the same: [CODE]Intel Core2 Q9550 @ 3.4GHz (Vista 64bit) Tests using N = 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 P+1 (B1=20M, B2=2147483647, x0=524328229) ========================================= GMP-ECM 6.2.2 64bit MSVC = 0m 10.821s GMP-ECM 6.2.1 64bit MSVC = 0m 10.883s GMP-ECM 6.2.2 32bit MSCV = 0m 19.362s GMP-ECM 6.2.1 32bit MSCV = 0m 19.408s P-1 (B1=20M, B2=2147483647, x0=524328229) ========================================= GMP-ECM 6.2.2 64bit MSVC = 0m 04.310s GMP-ECM 6.2.1 64bit MSVC = 0m 05.533s GMP-ECM 6.2.2 32bit MSCV = 0m 10.633s GMP-ECM 6.2.1 32bit MSCV = 0m 10.527s ECM (B1=20M, B2=2147483647, sigma=980060817) ============================================ GMP-ECM 6.2.2 64bit MSVC = 0m 46.153s GMP-ECM 6.2.1 64bit MSVC = 0m 46.754s GMP-ECM 6.2.2 32bit MSCV = 1m 29.924s GMP-ECM 6.2.1 32bit MSCV = 1m 29.753s[/CODE] |
Hello,
thanks Jeff for providing the binaries. I just made an request to my Boinc users to check the speed of the gmp-ecm binaries. Results are posted [url=http://www.rechenkraft.net/phpBB/viewtopic.php?f=56&t=9555]here[/url]. Would be also good if we can have also binaries for other operating systems (Linux, Mac,...) at one central place. yoyo |
[QUOTE=yoyo;166558]Hello,
thanks Jeff for providing the binaries. I just made an request to my Boinc users to check the speed of the gmp-ecm binaries. Results are posted [url=http://www.rechenkraft.net/phpBB/viewtopic.php?f=56&t=9555]here[/url].[/QUOTE] Interesting, thanks for posting the link. In some cases the 6.2.1 was faster and 6.2.2 for others. At least the AMD version was always faster on AMD systems and Intel version for Intel systems so I'm glad I went to the trouble of producing both sets. Jeff. |
Jeff, I notice your your builds for ECM6.22 are now linked with [MPIR 1.0 (GMP 4.2.1)]. Is it safe to assume your configure options include --enable-asm-redc?
|
[QUOTE=tmorrow;166625]Jeff, I notice your your builds for ECM6.22 are now linked with [MPIR 1.0 (GMP 4.2.1)]. Is it safe to assume your configure options include --enable-asm-redc?[/QUOTE]
Yes, with the MSVC project file, the redc assembler code is enabled by default. I'm using MPIR 1.0.0 now since it is faster than GMP 4.2.4 (the third party core2/amd64 optimizations don't work in Windows MSVC) and MPIR will have good and continuing Windows support built-in. Jeff. |
[QUOTE=Jeff Gilchrist;166577]Interesting, thanks for posting the link. In some cases the 6.2.1 was faster and 6.2.2 for others. At least the AMD version was always faster on AMD systems and Intel version for Intel systems so I'm glad I went to the trouble of producing both sets.
Jeff.[/QUOTE] I made this request to find out it I use the right windows version of gmp-ecm in Boinc and if 6.2.2 is faster and I should update the Boinc network to this version. yoyo |
[QUOTE=Jeff Gilchrist;166637]Yes, with the MSVC project file, the redc assembler code is enabled by default. I'm using MPIR 1.0.0 now since it is faster than GMP 4.2.4 (the third party core2/amd64 optimizations don't work in Windows MSVC) and MPIR will have good and continuing Windows support built-in.
Jeff.[/QUOTE] I found your win32 binaries to be noticeably slower at stage 2 than my binary (see [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_win32.zip"]ecm621_win32.zip[/URL]). Are you sure that the asm-redc code is enabled here? |
[QUOTE=Yamato;166660]I found your win32 binaries to be noticeably slower at stage 2 than my binary (see [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_win32.zip"]ecm621_win32.zip[/URL]). Are you sure that the asm-redc code is enabled here?[/QUOTE]
Maybe not, I thought it was. What are you compiling with? |
[QUOTE=Jeff Gilchrist;166670]Maybe not, I thought it was. What are you compiling with?[/QUOTE]
I'm using mingw and msys (running WinXP 32-bit). I don't have any experience with MSVC. |
[QUOTE=Yamato;166763]I'm using mingw and msys (running WinXP 32-bit). I don't have any experience with MSVC.[/QUOTE]
Ok I found out what is going on. The Windows MSVC version uses a different syntax of assembler which Brian Gladman translates but the gcc version of the code is newer and more optimized and Brian hasn't had time to translate the latest assembler into Windows yet. So that explains the speed difference. The MSVC version is using asm-redc code, but just not the latest and greatest version. |
[quote=Jeff Gilchrist;166790]Ok I found out what is going on. The Windows MSVC version uses a different syntax of assembler which Brian Gladman translates but the gcc version of the code is newer and more optimized and Brian hasn't had time to translate the latest assembler into Windows yet. So that explains the speed difference. The MSVC version is using asm-redc code, but just not the latest and greatest version.[/quote]
I'm not so sure of this now since it is a 32-bit issue and I don't think the 32-bit assembler code has changed much recently. Maybe Alex can confirm this? Brian |
[QUOTE=Yamato;166660]I found your win32 binaries to be noticeably slower at stage 2 than my binary (see [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_win32.zip"]ecm621_win32.zip[/URL]). Are you sure that the asm-redc code is enabled here?[/QUOTE]
Does the win32 version turn on the SSE2 code used for NTTs in stage 2? This would give you a big speedup (~30% for P+-1 and maybe 10% for ECM, the last time I tested it) |
[quote=jasonp;166868]Does the win32 version turn on the SSE2 code used for NTTs in stage 2? This would give you a big speedup (~30% for P+-1 and maybe 10% for ECM, the last time I tested it)[/quote]
None of the 32-bit inline SSE2 code is used in Windows. I did convert the dedicated 32-bit assembler code files but none of the inline SSE2 routines. So this might explain the slow down. Brian |
Both the updated mulredc asm code for x86_64 (matters almost only in stage 1), and the new SSE2 asm code for the NTT (matters only in stage 2) courtesy of Jason were introduced in release 6.2. The two are independent - the former is enabled with --enable-asm-redc, the latter with --enable-sse2 (or if neither --enable-sse2 nor --disable-sse2 is given, enabled by default if the CPU is identified as a Pentium 4, disabled otherwise). I know Brian ported the mulredc code to MSVC, but I don't know about the SSE2 asm stuff... it seems to be inlined with GNU asm() syntax, so that would not work in an MSVC build. The non-SSE2 helper macros for modular arithmetic for the NTT are there in MSVC syntax, though. I think this means that currently, the SSE2 asm code for the NTT is not available in MSVC builds.
Then again, a 32 bit Windows binary can be made with MinGW, which does understand all the inlined gcc asm stuff... Alex |
[quote=akruppa;166884]Both the updated mulredc asm code for x86_64 (matters almost only in stage 1), and the new SSE2 asm code for the NTT (matters only in stage 2) courtesy of Jason were introduced in release 6.2. The two are independent - the former is enabled with --enable-asm-redc, the latter with --enable-sse2 (or if neither --enable-sse2 nor --disable-sse2 is given, enabled by default if the CPU is identified as a Pentium 4, disabled otherwise). I know Brian ported the mulredc code to MSVC, but I don't know about the SSE2 asm stuff... it seems to be inlined with GNU asm() syntax, so that would not work in an MSVC build. The non-SSE2 helper macros for modular arithmetic for the NTT are there in MSVC syntax, though. I think this means that currently, the SSE2 asm code for the NTT is not available in MSVC builds.
Then again, a 32 bit Windows binary can be made with MinGW, which does understand all the inlined gcc asm stuff...[/quote] Thanks Alex, this accords with my understanding. I might get a chance to translate the new x64 code but it is difficult to do so I am not even sure about this. I absolutely hate m4 and wish someone would bury it - if we could automate this with Python rather than m4 it would be a lot easier. There is no prospect that any 32-bit code will get converted to Windows (by me at least) as I am now only working on x64 assembler. Brian |
The m4-generated mulredc code hasn't changed much since you ported it. A few movq have been replaced by movl where possible, and one useless carry propagation has been removed. The speed is almost exactly the same, though. (Aside: no one like my m4 scripts! :cry: Pierrick told me I was nuts to use m4 and patently refuses to look at the code, even though it's based on his assembly code... I thought m4 wasn't that bad actually, except maybe for the recursively expanded for-loops.)
As for the SSE2 code... I know practically nothing about the asm inlining syntax of MSVC. The inlined code comes in a few big chunks, maybe it could be translated pretty mechanically... Alex |
[quote=akruppa;166894]The m4-generated mulredc code hasn't changed much since you ported it. A few movq have been replaced by movl where possible, and one useless carry propagation has been removed. The speed is almost exactly the same, though. (Aside: no one like my m4 scripts! :cry: Pierrick told me I was nuts to use m4 and patently refuses to look at the code, even though it's based on his assembly code... I thought m4 wasn't that bad actually, except maybe for the recursively expanded for-loops.)
As for the SSE2 code... I know practically nothing about the asm inlining syntax of MSVC. The inlined code comes in a few big chunks, maybe it could be translated pretty mechanically...[/quote] Its mainly convering AT&T syntax to Intel syntax and setting % parameters into specific registers. You are right that its not that hard to do but it is tedious and error prone. I have this largely automated for pure x64 assembler code but not for x86 stuff. Brian |
Sorry that the NTT asm is all in big blocks. It could be broken up into smaller macros that get composed together, but I think it would have been just as complex to figure out the looping, data access and register allocation in a modular way.
I think the aversion to m4 is that it's stack-based, and everyone's mental model of a full-featured preprocessor assumes execution is procedural. |
[QUOTE=akruppa;166884]The two are independent - the former is enabled with --enable-asm-redc, the latter with --enable-sse2 (or if neither --enable-sse2 nor --disable-sse2 is given, enabled by default if the CPU is identified as a Pentium 4, disabled otherwise). [/QUOTE]
I know that --enable-asm-redc is not enabled by default (it isn't even mentioned in INSTALL so I had no idea it existed until you mentioned it a month or so ago). And now I'm hearing about --enable-sse2 which is also not mentioned in INSTALL. Can you document these speed-increasing features somewhere? Now did you say that --enable-sse2 is done automatically by configure if it detects the right system or that is totally manual as well? Jeff. |
[QUOTE=jasonp;166935]Sorry that the NTT asm is all in big blocks. It could be broken up into smaller macros that get composed together, but I think it would have been just as complex to figure out the looping, data access and register allocation in a modular way.
[/QUOTE] Based on my practically non-existent understanding of VC inline asm syntax, I think the large chunks may actually make translation easier... you have to move data to registers at the start and back to variables at the end (what gcc does for you via data constraints), with large chunks you don't have to do it as often. [QUOTE=Jeff Gilchrist;166947]I know that --enable-asm-redc is not enabled by default (it isn't even mentioned in INSTALL so I had no idea it existed until you mentioned it a month or so ago). And now I'm hearing about --enable-sse2 which is also not mentioned in INSTALL. Can you document these speed-increasing features somewhere? [/QUOTE] Umm, yeah... that's a documentation bug that needs fixing. [QUOTE=Jeff Gilchrist;166947] Now did you say that --enable-sse2 is done automatically by configure if it detects the right system or that is totally manual as well? Jeff.[/QUOTE] As mentioned, if neither --enable-sse2 nor --disable-sse2 is given, enabled by default if the CPU is identified as a Pentium 4, disabled otherwise. The sticky bit here is "if the CPU is identified as a Pentium 4," because that isn't terribly reliable. We go by what config.guess tells right now, which is mostly based on what uname tells. In older version of autotools, Pentium 4 was reported as pentium4-*-*, but it seems now it's called i786-*-*. On a Pentium 4 in our lab, uname and hence config.guess returns i686. So right now ./configure detecting a Pentium 4 as such is a matter of sheer luck, until we improve the detection as discussed in [url]http://www.mersenneforum.org/showthread.php?t=11466[/url] Alex |
All modern Intel/AMD CPUs support SSE2 instructions, so why does the configuration routine only accept the (outdated) Pentium 4?
|
The NTT code uses 62-bit primes on 64-bit systems, without SSE2 instructions.
Alex |
[QUOTE=akruppa;166961]The NTT code uses 62-bit primes on 64-bit systems, without SSE2 instructions.[/QUOTE]
So for 32bit systems, using SSE2 should provide a speedup, but with 64bit systems it is better not to use the SSE2 right? Jeff. |
On 64 bit systems, the --enable-sse2 configure option has no effect. The SSE2 code is compiled in if and only if gcc or icc is used, the __i386__ macro is set by the preprocessor (it's not set by gnu cpp on x86-64 systems), and HAVE_SSE2 is defined. All --enable-sse2 does is define HAVE_SSE2.
Alex |
[quote=Brian Gladman;166900]Its mainly convering AT&T syntax to Intel syntax and setting % parameters into specific registers. You are right that its not that hard to do but it is tedious and error prone. I have this largely automated for pure x64 assembler code but not for x86 stuff.[/quote]
I have now converted the SSE2 inline code for use on 32-bit Windows. I have put the updated code in the SVN repository. Brian |
Cool! This is great, thank you!
Alex |
[QUOTE=akruppa;166991]Cool! This is great, thank you!
[/QUOTE] I will test out that new code as soon as I get a chance, but first something less fun, taxes.. :down: |
[quote=akruppa;166991]Cool! This is great, thank you!
[/quote] It's a pleasure to help Alex. I found that a few changes were needed to build the current GMP-ECM version with Visual Studio so I have updated the Windows build projects in the SVN repository. I have also taken the opportunity to add the 'tune' program to the Windows build. And it is, I hope, now a bit easier to select the win32/x64 and AMD/Intel build configurations. This has had limited testing so I would appreciate feedback from users on Windows, especially if they run into problems. The tune program doesn't need to be built but if this is done it takes some time because it is executed as soon as it is built. The new files are now in the SVN repository. Brian |
[QUOTE=Jeff Gilchrist;166979]So for 32bit systems, using SSE2 should provide a speedup, but with 64bit systems it is better not to use the SSE2 right?
[/QUOTE] Yes. Opteron, Core, Core2 systems all would benefit in 32-bit mode from the asm. I doubt that the configure script would be able to figure out from shell commands alone what it's running on, short of interrogating /proc/cpu. |
what about having an optional flag for ./configure that tells it what to compile for?
|
It has: you can use the "--build" command line parameter to override automatic detection of the system type. For example, if config.guess identifies you Pentium 4 as merely a i686, you can use "--build=pentium4" to make configure use mulredc asm code from pentium4/ instead of athlon/.
Alex |
[quote=Brian Gladman;167035]It's a pleasure to help Alex.
I found that a few changes were needed to build the current GMP-ECM version with Visual Studio so I have updated the Windows build projects in the SVN repository. I have also taken the opportunity to add the 'tune' program to the Windows build. And it is, I hope, now a bit easier to select the win32/x64 and AMD/Intel build configurations. This has had limited testing so I would appreciate feedback from users on Windows, especially if they run into problems. The tune program doesn't need to be built but if this is done it takes some time because it is executed as soon as it is built. The new files are now in the SVN repository.[/quote] I have made a further update to SVN for the Windows build to tidy up several issues. Please treat 'tune' on Windows with extreme caution as it produces inconsistent results for several thresholds. Brian |
[QUOTE=Brian Gladman;167102]I have made a further update to SVN for the Windows build to tidy up several issues.[/QUOTE]
I have updated my binaries page with new versions of GMP-ECM 6.2.2 that includes Brian's SSE2 instructions for Win32 so the Stage2 timings are now faster and hopefully similar to the UNIX binaries on Windows. [url]http://gilchrist.ca/jeff/factoring/[/url] Jeff. |
Does anybody made a measurement how much faster it is?
yoyo |
[QUOTE=yoyo;167147]Does anybody made a measurement how much faster it is?[/QUOTE]
You mean the SSE2 vs non-SSE2 Windows binaries? The new SSE2 32bit binary: [CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156 1 (85 digits) Using B1=300000000, B2=3178599824416, polynomial Dickson(30), sigma=3509569131 Step 1 took 1247383ms Step 2 took 415962ms real 28m26.768s user 0m0.000s sys 0m0.015s[/CODE] The older non-SSE2 32bit binary: [CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 187713882435985950801552411965250686960095972178128917919069302730202867937737100156 1 (85 digits) Using B1=300000000, B2=3178599824416, polynomial Dickson(30), sigma=3509569131 Step 1 took 1254529ms Step 2 took 465320ms real 29m20.967s user 0m0.000s sys 0m0.031s[/CODE] You can see stage 2 is faster in the SSE2 version. |
[QUOTE=Jeff Gilchrist;167304]You can see stage 2 is faster in the SSE2 version.[/QUOTE]
That doesn't change the fact that the [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_win32.zip"]MinGW/gcc-binary[/URL] is still faster at stage 2: [CODE]GMP-ECM 6.2.1 [powered by GMP 4.2.4] [ECM] Input number is ((10^197-9^197)/916013359)/145293034519 (177 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2147216171 Step 1 took 40203ms Step 2 took 11860ms[/CODE] [CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is ((10^197-9^197)/916013359)/145293034519 (177 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2607103582 Step 1 took 39891ms Step 2 took 15859ms[/CODE] The MSVC compiler seems not to be 'optimal' in this case (or maybe the reason is gmp 4.2.1 ?!). |
I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85[CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM]
Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786 Step 1 took 8673ms Step 2 took 7332ms[/CODE]I'm new to linux, but i've finally managed to compile my own ecm and am surprised that it's significantly faster even though it's running in VMWare[CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1798745233 Step 1 took 7872ms Step 2 took 4660ms[/CODE] |
[QUOTE=smh;168354]I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85
[CODE]GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786 Step 1 took 8673ms Step 2 took 7332ms[/CODE][/quote] What processor do you have and what speed is it? I noticed your B2 value is much lower than mine (my B2=3178599824416) and you used a different sigma for your test (my sigma=3509569131). Were you just lowering that to reduce the time it takes to do the test? The Windows MSVC code uses a different set of assembler than the Linux code so it doesn't surprise me that the timing is different. If you choose the same sigma for both your Windows and Linux tests, and choose a larger B2 value so the test runs a little longer do you still see the huge difference? Try running each test twice just to make sure the numbers are similar in case your system decided to do something during the test and artificially slowed down the benchmark for one. Jeff. |
I see that you used a B1=300M, i used 3M.
I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10). |
@smh:
Could you please post this binary and/or compare it with [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_lin64.tar.gz"]my 64-bit binary[/URL]? I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones. |
[QUOTE=smh;168384]I see that you used a B1=300M, i used 3M.[/QUOTE]
Ah, that would explain the difference. :smile: [QUOTE=smh;168384]I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10).[/QUOTE] I realize that, I'm just trying to figure out why there is such a big difference. If you had an AMD processor then I could see how the Core2 version would be slower than the Linux version (which would have detected an AMD processor if you compiled it yourself). As I said before, Brian Gladman had to translate the assembler from the syntax used by GCC to the one that YASM (used in the MSVC) build understands. I think he said that some of the code in the linux source is still newer than what he has translated. Since I'm not familiar with the code, I'm not sure why there is such a big difference. Jeff. |
[QUOTE=Yamato;168385]@smh:
Could you please post this binary and/or compare it with [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm621_lin64.tar.gz"]my 64-bit binary[/URL]? I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones.[/QUOTE]With B1 <= 1M, there is to much variation to see. With larger B1 yours is consistantly faster in step 2, [URL="http://www.angelfire.com/falcon/aliquot/ecm.tar.gz"]mine[/URL] most of the time in step one. I did limited testing, but with larger composites yours might also be faster in step 1. Notice i used GMP-ECM 6.2.2 and GMP 4.2.4 (with the core2 patch), so it might be apples and oranges. With B1=3M[code]GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=959787799 Step 1 took 8008ms Step 2 took 4496ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1211299266 Step 1 took 7865ms Step 2 took 4328ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=573230298 Step 1 took 7989ms Step 2 took 4340ms GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=937001321 Step 1 took 7808ms Step 2 took 4500ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1410435444 Step 1 took 7773ms Step 2 took 4500ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=3426145601 Step 1 took 7921ms Step 2 took 4500ms[/code] With B1=11M[code]GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1064336844 Step 1 took 29329ms Step 2 took 14061ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3355605506 Step 1 took 28858ms Step 2 took 14157ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=191990272 Step 1 took 29342ms Step 2 took 14181ms GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1387859769 Step 1 took 28389ms Step 2 took 14777ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=4281716356 Step 1 took 27850ms Step 2 took 14685ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3779197836 Step 1 took 27638ms Step 2 took 14681ms[/code] |
I took ECM 6.2.2 and compiled it with MPIR 1.0 in cygwin to compare the LINUX code to what Windows MSVC code is doing. I saw a similar pattern to all of you as well. This is all 32bit code run on an Intel Core2 Q9550 @ 3.4GHz.
[B]ECM[/B] Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 Sigma: 980060817 MSVC 6.2.2 with new SSE2: Step 1 took 82837ms | Step 1 took 82790ms Step 2 took 41137ms | Step 2 took 41402ms MSVC 6.2.2 without SSE2: Step 1 took 82867ms | Step 1 took 83071ms Step 2 took 42557ms | Step 2 took 43337ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 78359ms | Step 1 took 78531ms Step 2 took 34695ms | Step 2 took 34086ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 78375ms | Step 1 took 78718ms Step 2 took 24445ms | Step 2 took 24367ms [B]P-1[/B] Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 9469ms | Step 1 took 9563ms Step 2 took 7098ms | Step 2 took 7051ms MSVC 6.2.2 without SSE2: Step 1 took 9360ms | Step 1 took 9235ms Step 2 took 11731ms | Step 2 took 11404ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 8751ms | Step 1 took 8487ms Step 2 took 5788ms | Step 2 took 5740ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 8455ms | Step 1 took 8658ms Step 2 took 5788ms | Step 2 took 5710ms [B]P+1[/B] Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 17082ms | Step 1 took 17145ms Step 2 took 8596ms | Step 2 took 8408ms MSVC 6.2.2 without SSE2: Step 1 took 17675ms | Step 1 took 17566ms Step 2 took 15585ms | Step 2 took 15553ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 14570ms | Step 1 took 14617ms Step 2 took 7566ms | Step 2 took 7816ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 14929ms | Step 1 took 14602ms Step 2 took 7706ms | Step 2 took 7862ms You can see that the new MSVC build that uses SSE2 is much faster in Stage 2 than the old build, but the Linux code built with gcc (in cygwin on Windows or whatever) is faster in both Stage1 and Stage2. So if you want the fastest possible ECM/P-1/P+1 you could install cygwin/mingw or run Linux/Linux in VM Jeff. |
[QUOTE]GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3
Step 1 took 78359ms | Step 1 took 78531ms Step 2 took 34695ms | Step 2 took 34086ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 78375ms | Step 1 took 78718ms Step 2 took 24445ms | Step 2 took 24367ms[/QUOTE] This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases? Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests? Alex |
[QUOTE=akruppa;168775]This I find a bit strange... --enable-sse2 should always enable SSE2 code in stage 2, independent of build type (so long as it's a 32-bit build), so the stage 2 timings should not differ by this much. Did "HAVE_SSE2" get defined in config.h in both cases?
Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests?[/QUOTE] For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing. Both config.h files contain #define HAVE_SSE2 1 Both linked the mulredc files from pentium4/ Jeff. |
[QUOTE=Jeff Gilchrist;168778]For whatever reason it thought my Intel Core2 Q9550 @ 3.4GHz was a pentium3 if I just let configure do its own thing.[/QUOTE]
Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run. |
[QUOTE=rogue;168780]Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.[/QUOTE]
GMP-ECM thinks its a P3. MPIR has core2/nocona specifc code which was used when building that. Jeff. |
Jeff, could we please have P4-optimized binaries of GMP-ECM?
|
[QUOTE=10metreh;168795]Jeff, could we please have P4-optimized binaries of GMP-ECM?[/QUOTE]
If you are talking about MSVC 32bit versions, that is what the 32bit built is made for, the Pentium 4. But as we have seen here it might be faster to roll your own Linux/cygwin version. There are no 64bit Pentium4 versions. |
gmp-ecm 6.2.2, [B]gmp 4.3.0[/B], Linux-64-Bit, Core 2, asm-redc: [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm622_lin64.tar.gz"]ecm622_lin64.tar.gz[/URL]
|
GMP-ECM [B]6.2.3[/B] with MPIR 1.1.1 compiled for Windows 32bit & 64bit using Visual Studio 2008:
[url]http://gilchrist.ca/jeff/factoring/[/url] |
Can you please compile ecm 6.2.3 with GMP 4.3.0 for Core2 Win32 ?
It will be interesting to see the speedup from GMP 4.3.0 |
[QUOTE=Andi_HB;171046]Can you please compile ecm 6.2.3 with GMP 4.3.0 for Core2 Win32 ?
It will be interesting to see the speedup from GMP 4.3.0[/QUOTE] Is that aimed at me? I can give you cygwin benchmarks with GMP 4.3.0 but can't compare GMP 4.3.0 with MSVC builds because GMP 4.3.0 has no Windows compiler support. |
[quote=Jeff Gilchrist;171050]Is that aimed at me? I can give you cygwin benchmarks with GMP 4.3.0 but can't compare GMP 4.3.0 with MSVC builds because GMP 4.3.0 has no Windows compiler support.[/quote]
what about mingw? can't you cross compile stuff for windows with that? |
Here are some more binaries for gmp-ecm 6.2.3 + gmp 4.3.0:
Linux, 64bit, Core2: [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm623_lin64.tar.gz"]ecm623_lin64.tar.gz[/URL] Linux, 32bit, Pentium4: [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm623_lin32.tar.gz"]ecm623_lin32.tar.gz[/URL] Windows, 32bit, Core2: [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm623_win32.zip"]ecm623_win32.zip[/URL] |
Timings
[QUOTE=Yamato;171060]Here are some more binaries for gmp-ecm 6.2.3 + gmp 4.3.0:
Windows, 32bit, Core2: [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm623_win32.zip"]ecm623_win32.zip[/URL][/QUOTE] Timings on a Core2Duo at 1.8 GHz (running under Win XP 32 bit) GMP-ECM 6.2 with GMP 4.2.2, compiled with MinGW/Msys [code]GMP-ECM 6.2 [powered by GMP 4.2.2] [ECM] Input number is 4363521036736243362909434674593128775074296867895664746056146512001110140536995910538826352740264283689 (103 digits) Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1806323290 Step 1 took 13672ms Step 2 took 7734ms Run 2 out of 651: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1282855583 Step 1 took 13719ms Step 2 took 7750ms Run 3 out of 651: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1254100664 Step 1 took 13812ms Step 2 took 7907ms Run 4 out of 651: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=630890257 Step 1 took 13718ms Step 2 took 7828ms Run 5 out of 651: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=1730893806 Step 1 took 13969ms Step 2 took 7781ms[/code] GMP-ECM 6.2.3 with GMP 4.3.0 (precompiled from Yamato) [code]GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 4363521036736243362909434674593128775074296867895664746056146512001110140536995910538826352740264283689 (103 digits) Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=3642599884 Step 1 took 10141ms Step 2 took 4750ms Run 2 out of 5: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=2597421712 Step 1 took 10062ms Step 2 took 4750ms Run 3 out of 5: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=2261704683 Step 1 took 10391ms Step 2 took 4812ms Run 4 out of 5: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=3276911541 Step 1 took 9875ms Step 2 took 4797ms Run 5 out of 5: Using B1=1000000, B2=1045563762, polynomial Dickson(6), sigma=4096292858 Step 1 took 10719ms Step 2 took 4750ms [/code] Now THAT's a speedup! :party: |
[QUOTE=Yamato;171060]Here are some more binaries for gmp-ecm 6.2.3 + gmp 4.3.0:[/QUOTE]
What command lines did you use for configure to compile gmp-ecm and gmp 4.3.0for Win32? Thanks, Jeff. |
[QUOTE=henryzz;171053]what about mingw?
can't you cross compile stuff for windows with that?[/QUOTE] You can create a 32bit binary that will run on Windows with mingw yes, I was just pointing out that GMP 4.3.0 does not have Windows support so can't be used by the Visual Studio compiler, you need something like gcc (ie: in cygwin or mingw) to compile the Linux version of the source. |
[QUOTE=Andi47;171137]Now THAT's a speedup! :party:[/QUOTE]
That is a huge difference. My guess is your your GMP-ECM 6.2 with GMP 4.2.2 binary was probably *not* using the muldrec assembler code. That alone would make a huge difference. |
[QUOTE=Jeff Gilchrist;171153]That is a huge difference. My guess is your your GMP-ECM 6.2 with GMP 4.2.2 binary was probably *not* using the muldrec assembler code. That alone would make a huge difference.[/QUOTE]
I had compiled my GMP-ECM 6.2 / GMP 4.2.2 binary with these command lines: [code]./configure --with-gmp=/usr/local --enable-asm-redc make make check make ecm-params; make make install [/code] But GMP 4.2.2 had (mis-)detected my Core2Duo as Pentium 3, so that might be an additional reason for the huge difference. |
[QUOTE=Andi47;171162]But GMP 4.2.2 had (mis-)detected my Core2Duo as Pentium 3, so that might be an additional reason for the huge difference.[/QUOTE]
That and gmp-ecm probably picked pentium3 as well so didn't enable the sse2 code on your 6.2 binary as well. Jeff. |
[QUOTE=Jeff Gilchrist;171146]What command lines did you use for configure to compile gmp-ecm and gmp 4.3.0for Win32?
Thanks, Jeff.[/QUOTE] I used MinGW+msys under WinXP on Intel E8500 (Wolfdale). ./configure (for gmp, without any options. sse2 was enabled automatically) ./configure --with-gmp=/local --enable-asm-redc --enable-sse2 (for ecm) Interestingly, my older Notebook CPU (Intel T7300) wasn't recognized as Core2. |
[QUOTE=Yamato;171184]I used MinGW+msys under WinXP on Intel E8500 (Wolfdale).
./configure (for gmp, without any options. sse2 was enabled automatically) ./configure --with-gmp=/local --enable-asm-redc --enable-sse2 (for ecm) [/QUOTE] Thanks for the info. Since you didn't force it to use --build=pentium4 for ecm did you happen to notice what it detected it as (i686?). You might want to try using pentium4 as I saw a speedup from the default i686 selection: ./configure --with-gmp=/local --enable-asm-redc --enable-sse2 --build=pentium4 Jeff. |
Without --build=pentium4 but with --enable-asm-redc, it'll use the asm mulredc code from athlon/ which is slower on Pentium 4. With --build=pentium4 and --enable-asm-redc, it'll use the asm mulredc code from pentium4/. Without --enable-asm-redc, it'll use only GMP functions, which seems to be faster than our own assembly code if GMP 4.3.0 is used. This is why --enable-asm-redc is enabled by default only on x86-64 now.
Alex |
Thx to all for the good work - and special thx to Yamato.
The binary from Yamato is the fastest till now for my Core 2 Duo T8100 on WinVista 32bit :beer: |
[QUOTE=akruppa;171197]Without --build=pentium4 but with --enable-asm-redc, it'll use the asm mulredc code from athlon/ which is slower on Pentium 4. With --build=pentium4 and --enable-asm-redc, it'll use the asm mulredc code from pentium4/. Without --enable-asm-redc, it'll use only GMP functions, which seems to be faster than our own assembly code if GMP 4.3.0 is used. This is why --enable-asm-redc is enabled by default only on x86-64 now.[/QUOTE]
Hmmm my testing seems to indicate otherwise. *** Windows ECM BENCHMARK *** Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 Sigma: 980060817 Using B1=20000000, B2=2158570060, polynomial Dickson(6), sigma=980060817 =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cygwin =================================================== Step 1 took 80761ms Step 2 took 2683ms real 1m23.570s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== Step 1 took 86674ms Step 2 took 2698ms real 1m29.607s You can see that even with GMP 4.3.0 that using --enable-asm-redc is faster. Jeff. |
Another binary, built with
./configure --with-gmp=/local --enable-sse2 --build=pentium4 [URL="http://www.kay-schoenberger.de/eng/math/ecm/binaries/ecm623_win32_2.zip"]ecm623_win32_2.zip[/URL] This seems to be faster only for input numbers > 4.6*10^192. |
[QUOTE=Yamato;171270]This seems to be faster only for input numbers > 4.6*10^192.[/QUOTE]
Some more benchmarks using different size inputs from my binaries. In my case it seems that a C65, C85, and C130 are all faster using the muldrec code instead of leaving it up to GMP 4.3.0 alone: [CODE]*** Windows 32bit ECM BENCHMARK *** Factoring: 34053408309992030649212497354061832056920539397279047809781589871 Sigma: 980060817 =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 34053408309992030649212497354061832056920539397279047809781589871 (65 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 58407ms Step 2 took 19344ms real 1m18.148s user 1m17.766s sys 0m0.109s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 34053408309992030649212497354061832056920539397279047809781589871 (65 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 62821ms Step 2 took 19921ms real 1m23.117s user 1m22.757s sys 0m0.061s Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 Sigma: 980060817 =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 79997ms Step 2 took 23993ms real 1m45.140s user 1m44.021s sys 0m0.108s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 87001ms Step 2 took 23946ms real 1m52.173s user 1m50.947s sys 0m0.108s Factoring: 3561374769003472006611194942083317805928391841857811709042682130841367523415658737688338172847927090359833780290773316642214955689 Sigma: 980060817 =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 3561374769003472006611194942083317805928391841857811709042682130841367523415658737688338172847927090359833780290773316642214955689 (130 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 150479ms Step 2 took 39250ms real 3m11.015s user 3m9.759s sys 0m0.093s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 3561374769003472006611194942083317805928391841857811709042682130841367523415658737688338172847927090359833780290773316642214955689 (130 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 155174ms Step 2 took 39515ms real 3m15.745s user 3m14.704s sys 0m0.187s[/CODE] |
Back in version 6.1.1 when I tested it, --enable-asm-redc was faster up to around 190-200 digit numbers. Above that it was faster without the switch.
[URL="http://www.mersenneforum.org/showpost.php?p=84051&postcount=28"]http://www.mersenneforum.org/showpost.php?p=84051&postcount=28[/URL] |
[QUOTE=ATH;171314]Back in version 6.1.1 when I tested it, --enable-asm-redc was faster up to around 190-200 digit numbers. Above that it was faster without the switch.[/QUOTE]
Yes I see that now, for 241 and 305 digit numbers the --enable-asm-redc is slower. [CODE]*** Windows 32bit ECM BENCHMARK *** =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cywin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 34269735914710278317669605703327411832660646302583686987227257506317419684607569302624515641501281679267924130251301180432079060240764836074934702194735093660818358068090036670915897316672819579451986178724280110948350208426589224080751249 (241 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 463572ms Step 2 took 88967ms real 9m26.480s user 9m12.554s sys 0m0.233s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 3426973591471027831766960570332741183266064630258368698722725750631741968460756930286245156415012816792679241302513011804320790602407648360749347021947350936608183580680900366709158975316672819579451986178724280110948350208426589224080751249 (241 digits) Using B1=20000000, B2=70272304840, polynomial Dickson(12), sigma=980060817 Step 1 took 394542ms Step 2 took 84678ms real 8m11.034s user 7m59.250s sys 0m0.108s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-asm-redc --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 20328474162529416647311982959537843499999146827155841098525615602167851195804786454820231727377603285082682529078490663751950024881559689820074466693727520628346302576066843023544023886449021575509433349099599424248755063838567414089807242454201325599982322964639626679563755005084965722261867838430023049 (305 digits) Using B1=20000000, B2=73514040616, polynomial Dickson(12), sigma=980060817 Step 1 took 696279ms Step 2 took 127890ms real 14m4.983s user 13m44.184s sys 0m1.309s =================================================== ./configure --with-gmp=/home/Jeff/gmp-4.3.0/ --enable-sse2 --build=pentium4-pc-cygwin =================================================== GMP-ECM 6.2.3 [powered by GMP 4.3.0] [ECM] Input number is 20328474162529416647311982959537843499999146827155841098525615602167851195804786454820231727377603285082682529078490663751950024881559689820074466693727520628346302576066843023544023886449021575509433349099599424248755063838567414089807242454201325599982322964639626679563755005084965722261867838430023049 (305 digits) Using B1=20000000, B2=73514040616, polynomial Dickson(12), sigma=980060817 Step 1 took 572118ms Step 2 took 121634ms real 11m53.750s user 11m33.782s sys 0m1.123s[/CODE] So essentially that means if you want the fastest code you need to have two binaries and call the right one depending on the size of the number you are testing. |
[quote=Jeff Gilchrist;171354]So essentially that means if you want the fastest code you need to have two binaries and call the right one depending on the size of the number you are testing.[/quote]
i.e. gmp-ecm needs to be modified so that isn't true |
[quote=henryzz;171368]i.e. gmp-ecm needs to be modified so that isn't true[/quote]
6.2.3.1? That's a mouthful! |
[quote=10metreh;171372]6.2.3.1? That's a mouthful![/quote]
it will probably be 6.2.4 |
[QUOTE=henryzz;171368]i.e. gmp-ecm needs to be modified so that isn't true[/QUOTE]
I took a brief look at the mpmod.c code again and there's a lot of weird stuff going on. It had lots of bits and pieces added over the years and some of them don't really fit together. I won't get to do anything about it for quite a while, though - thesis stuff with hard deadlines first. But feel free to mess with the code - it's open source! (Embarrassing as that may be in various sections of GMP-ECM...) Alex |
[quote=akruppa;171384]I took a brief look at the mpmod.c code again and there's a lot of weird stuff going on. It had lots of bits and pieces added over the years and some of them don't really fit together. I won't get to do anything about it for quite a while, though - thesis stuff with hard deadlines first. But feel free to mess with the code - it's open source! (Embarrassing as that may be in various sections of GMP-ECM...)
Alex[/quote] i actually meant a quick hack that meant it used gmp/MPIR for anything larger than 192.5 digits |
Hmm. Try inserting
[CODE] if (nn > 20) { mpz_mul (modulus->temp1, S1, S2); ecm_redc_basecase (R, modulus->temp1, modulus); return; } [/CODE] in mpmod.c, line 393 (after the variable definitions) Alex |
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.