![]() |
Newer msieves are slow on Core i7
Hi,
it seems v1.39 and v1.38 (possibly earlier versions as well) are having some kind of trouble with Core i7 processors, at least when using the QS. They're about 20% slower than ye olde v1.16 on this cpu, as opposed to 5-10% faster on a Core (1) processor. I take it "using generic 32kb sieve core" has something to do with it? I've also tried the win64 version of v1.39 that Jeff Gilchrist has compiled (thank you both!) with similarly poor performance "using VC8 32kb sieve core". I'm guessing the Core i7 would run better with the "32kb Intel Core sieve core". Is there any way, short of fiddling about and compiling it myself, to force another sieve core? If not, do you have any plans to improve the i7 performance? [...] I actually did try compiling it under VC++ 2008 Express, but there seems to be some dependencies on gmp in the gnfs part unless I'm missing something. Is getting gmp to compile under VC still a big mess or has that changed the last couple of years? Looking at the code you're comparing the max supported cpuid call to 10 to determine if it's a Core cpu. The i7 apparently returns 11. Hacking your v1.39 windows binary at offset 0x18ec to 11 instead of 10 seems to make it use the Core core on my i7. At least it prints "using 32kb Intel Core sieve core". Curiously though, it's still running at the same speed as the unmodified v1.39. Possibly very marginally faster, but then insignificantly so compared to the gap between it and v1.16. Some more tweaking reveals that the "64kb Pentium 4 sieve core" is halfway between unmodified v1.39 and v1.16 in speed... v1.16 seems to use 64KB sieving blocks. Looks to me like the i7 doesn't really like your fancy new 32KB blocks. Any thoughts before I waste more time? :) cheers, Mikael |
The CPU identification does need to be better. As for the performance difference, if you're in a position to compile from source try changing HAVE_CMOV in include/mp.h to HAS_CMOV. This was a typo that turned off tiny snippets of code in performance-critical inner loops everywhere in the code, so maybe it would make up some of the difference.
See the readme in the MSVC project directory for compiling without GMP and GMP-ECM. Also, note that the code needs some tiny changes to compile with the MSVC Express compiler, and those changes are only in my local sources right now. |
[quote=mklasson;162756]Hi,
it seems v1.39 and v1.38 (possibly earlier versions as well) are having some kind of trouble with Core i7 processors, at least when using the QS. They're about 20% slower than ye olde v1.16 on this cpu, as opposed to 5-10% faster on a Core (1) processor. I take it "using generic 32kb sieve core" has something to do with it? I've also tried the win64 version of v1.39 that Jeff Gilchrist has compiled (thank you both!) with similarly poor performance "using VC8 32kb sieve core". I'm guessing the Core i7 would run better with the "32kb Intel Core sieve core". Is there any way, short of fiddling about and compiling it myself, to force another sieve core? If not, do you have any plans to improve the i7 performance? [...] I actually did try compiling it under VC++ 2008 Express, but there seems to be some dependencies on gmp in the gnfs part unless I'm missing something. Is getting gmp to compile under VC still a big mess or has that changed the last couple of years? Looking at the code you're comparing the max supported cpuid call to 10 to determine if it's a Core cpu. The i7 apparently returns 11. Hacking your v1.39 windows binary at offset 0x18ec to 11 instead of 10 seems to make it use the Core core on my i7. At least it prints "using 32kb Intel Core sieve core". Curiously though, it's still running at the same speed as the unmodified v1.39. Possibly very marginally faster, but then insignificantly so compared to the gap between it and v1.16. Some more tweaking reveals that the "64kb Pentium 4 sieve core" is halfway between unmodified v1.39 and v1.16 in speed... v1.16 seems to use 64KB sieving blocks. Looks to me like the i7 doesn't really like your fancy new 32KB blocks. Any thoughts before I waste more time? :) cheers, Mikael[/quote] Have a go at Yafu. It might be faster. |
[QUOTE=jasonp;162768]if you're in a position to compile from source try changing HAVE_CMOV in include/mp.h to HAS_CMOV.[/QUOTE]
I'm not, yet, but maybe I'll get it working... [quote]See the readme in the MSVC project directory for compiling without GMP and GMP-ECM.[/quote] I've tried that, but there still seems to be stuff in gnfs/poly/ that needs gmp. Ah well, I should probably just try to find a compilable gmp. [quote=10metreh]Have a go at Yafu. It might be faster. [/quote] Indeed! Yafu v1.06 win64 looks about 20-25% faster than msieve v1.16 on the i7, while the win32 binary is only slightly faster than msieve. On my Core 1 the roles are reversed with msieve v1.16 taking a 15% lead over yafu win32. |
[QUOTE=mklasson;162792]I'm not, yet, but maybe I'll get it working...
I've tried that, but there still seems to be stuff in gnfs/poly/ that needs gmp. [/QUOTE] Compiling GMP on Windows is fairly painless thanks to Brian Gladman. Download the [URL="http://gmplib.org/"]latest GMP[/URL] from the main website, then download the MSVC project files from [URL="http://gladman.plushost.co.uk/oldsite/computing/gmp4win.php"]Brian Gladman's site[/URL] and extract them into the gmp-4.2.4/ directory and read the instructions. Jeff. |
I fixed the source and re-benched out of curiosity on my Q9550 Core 2. It seems the 64bit code isn't really affected by the changes.
[CODE]SIQS (C80 = 43756152090407155008788902702412144383525640641502974083054213255054353547943661) ============================================================================================= YAFU 1.06 64bit MSVC = 4m 02.683s YAFU 1.06 32bit gcc-k8 = 4m 57.278s MSIEVE 1.39 32bit gcc = 5m 01.313s [HAS_CMOV] MSIEVE 1.39 32bit gcc = 5m 41.537s [HAVE_CMOV] MSIEVE 1.39 64bit MSVC = 6m 07.721s [HAS_CMOV] MSIEVE 1.39 64bit MSVC = 6m 08.474s [HAVE_CMOV] SIQS (C75 = 281396163585532137380297959872159569353696836686080935550459706878100362721) ======================================================================================== YAFU 1.06 64bit MSVC = 1m 36.408s MSIEVE 1.39 32bit gcc = 1m 57.033s [HAS_CMOV] YAFU 1.06 32bit gcc-k8 = 2m 00.339s MSIEVE 1.39 32bit gcc = 2m 06.055s [HAVE_CMOV] MSIEVE 1.39 64bit MSVC = 2m 16.717s [HAS_CMOV] MSIEVE 1.39 64bit MSVC = 2m 18.905s [HAVE_CMOV] [/CODE] So the fix seems to have made a big difference on the C80 and a noticeable one of the C75 (pushing the 32bit version ahead of 32bit YAFU). |
[QUOTE=Jeff Gilchrist;162807]Compiling GMP on Windows is fairly painless thanks to Brian Gladman. Download the [URL="http://gmplib.org/"]latest GMP[/URL] from the main website, then download the MSVC project files from [URL="http://gladman.plushost.co.uk/oldsite/computing/gmp4win.php"]Brian Gladman's site[/URL] and extract them into the gmp-4.2.4/ directory and read the instructions.[/QUOTE]
Great, thanks! I notice his readme suggests VS2008 Pro or higher. From what I've been able to read the differences between Standard and Pro are pretty insignificant unless maybe you're in a big corporate setting with a million coworkers all doing very strange things. I suspect MS feel the same way as they seem to have pulled the once-existing page describing the differences explicitly... :) There's a substantial price hike between the Std and Pro versions though. Does anyone know of a good reason to go with the Pro? It's a shame you can't buy only VC++ anymore. I have no use whatsoever for 80+ % of the VS content... |
[QUOTE=mklasson;163067]Great, thanks! I notice his readme suggests VS2008 Pro or higher. From what I've been able to read the differences between Standard and Pro are pretty insignificant unless maybe you're in a big corporate setting with a million coworkers all doing very strange things. I suspect MS feel the same way as they seem to have pulled the once-existing page describing the differences explicitly... :)[/QUOTE]
The free Express edition can't compile 64bit code, but like you said I think most of the features the average developer needs is with the Standard edition. |
[QUOTE=Jeff Gilchrist;162811]I fixed the source and re-benched out of curiosity on my Q9550 Core 2. It seems the 64bit code isn't really affected by the changes.[/QUOTE]
I wouldn't expect the 64-bit code to be affected, since there are no 64-bit processors without CMOV. |
I know what happened too; 32-bit x86 uses explicit CMOV instructions, and the asm statement that inserted them was modified when akruppa needed to fix a bug in GMP-ECM. In that library the asm statement was guarded with HAVE_CMOV, and I inadvertently copied that when I cut-n-pasted his version into msieve a few releases ago.
64-bit x86 uses generic C, and will use CMOV instructions anyway. It's possible that recent gcc will also use CMOVs without special prompting, but I haven't verified that. |
| All times are UTC. The time now is 01:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.