![]() |
![]() |
#1 |
Feb 2004
4028 Posts |
![]()
Hi,
it seems v1.39 and v1.38 (possibly earlier versions as well) are having some kind of trouble with Core i7 processors, at least when using the QS. They're about 20% slower than ye olde v1.16 on this cpu, as opposed to 5-10% faster on a Core (1) processor. I take it "using generic 32kb sieve core" has something to do with it? I've also tried the win64 version of v1.39 that Jeff Gilchrist has compiled (thank you both!) with similarly poor performance "using VC8 32kb sieve core". I'm guessing the Core i7 would run better with the "32kb Intel Core sieve core". Is there any way, short of fiddling about and compiling it myself, to force another sieve core? If not, do you have any plans to improve the i7 performance? [...] I actually did try compiling it under VC++ 2008 Express, but there seems to be some dependencies on gmp in the gnfs part unless I'm missing something. Is getting gmp to compile under VC still a big mess or has that changed the last couple of years? Looking at the code you're comparing the max supported cpuid call to 10 to determine if it's a Core cpu. The i7 apparently returns 11. Hacking your v1.39 windows binary at offset 0x18ec to 11 instead of 10 seems to make it use the Core core on my i7. At least it prints "using 32kb Intel Core sieve core". Curiously though, it's still running at the same speed as the unmodified v1.39. Possibly very marginally faster, but then insignificantly so compared to the gap between it and v1.16. Some more tweaking reveals that the "64kb Pentium 4 sieve core" is halfway between unmodified v1.39 and v1.16 in speed... v1.16 seems to use 64KB sieving blocks. Looks to me like the i7 doesn't really like your fancy new 32KB blocks. Any thoughts before I waste more time? :) cheers, Mikael |
![]() |
![]() |
![]() |
#2 |
Tribal Bullet
Oct 2004
355910 Posts |
![]()
The CPU identification does need to be better. As for the performance difference, if you're in a position to compile from source try changing HAVE_CMOV in include/mp.h to HAS_CMOV. This was a typo that turned off tiny snippets of code in performance-critical inner loops everywhere in the code, so maybe it would make up some of the difference.
See the readme in the MSVC project directory for compiling without GMP and GMP-ECM. Also, note that the code needs some tiny changes to compile with the MSVC Express compiler, and those changes are only in my local sources right now. Last fiddled with by jasonp on 2009-02-14 at 02:55 |
![]() |
![]() |
![]() |
#3 | |
Nov 2008
2×33×43 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#4 | |||
Feb 2004
2·3·43 Posts |
![]() Quote:
Quote:
Ah well, I should probably just try to find a compilable gmp. Quote:
|
|||
![]() |
![]() |
![]() |
#5 | |
Jun 2003
Ottawa, Canada
117310 Posts |
![]() Quote:
Jeff. |
|
![]() |
![]() |
![]() |
#6 |
Jun 2003
Ottawa, Canada
3×17×23 Posts |
![]()
I fixed the source and re-benched out of curiosity on my Q9550 Core 2. It seems the 64bit code isn't really affected by the changes.
Code:
SIQS (C80 = 43756152090407155008788902702412144383525640641502974083054213255054353547943661) ============================================================================================= YAFU 1.06 64bit MSVC = 4m 02.683s YAFU 1.06 32bit gcc-k8 = 4m 57.278s MSIEVE 1.39 32bit gcc = 5m 01.313s [HAS_CMOV] MSIEVE 1.39 32bit gcc = 5m 41.537s [HAVE_CMOV] MSIEVE 1.39 64bit MSVC = 6m 07.721s [HAS_CMOV] MSIEVE 1.39 64bit MSVC = 6m 08.474s [HAVE_CMOV] SIQS (C75 = 281396163585532137380297959872159569353696836686080935550459706878100362721) ======================================================================================== YAFU 1.06 64bit MSVC = 1m 36.408s MSIEVE 1.39 32bit gcc = 1m 57.033s [HAS_CMOV] YAFU 1.06 32bit gcc-k8 = 2m 00.339s MSIEVE 1.39 32bit gcc = 2m 06.055s [HAVE_CMOV] MSIEVE 1.39 64bit MSVC = 2m 16.717s [HAS_CMOV] MSIEVE 1.39 64bit MSVC = 2m 18.905s [HAVE_CMOV] Last fiddled with by Jeff Gilchrist on 2009-02-14 at 13:39 |
![]() |
![]() |
![]() |
#7 | |
Feb 2004
1000000102 Posts |
![]() Quote:
There's a substantial price hike between the Std and Pro versions though. Does anyone know of a good reason to go with the Pro? It's a shame you can't buy only VC++ anymore. I have no use whatsoever for 80+ % of the VS content... |
|
![]() |
![]() |
![]() |
#8 | |
Jun 2003
Ottawa, Canada
49516 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts |
![]() |
![]() |
![]() |
![]() |
#10 |
Tribal Bullet
Oct 2004
3,559 Posts |
![]()
I know what happened too; 32-bit x86 uses explicit CMOV instructions, and the asm statement that inserted them was modified when akruppa needed to fix a bug in GMP-ECM. In that library the asm statement was guarded with HAVE_CMOV, and I inadvertently copied that when I cut-n-pasted his version into msieve a few releases ago.
64-bit x86 uses generic C, and will use CMOV instructions anyway. It's possible that recent gcc will also use CMOVs without special prompting, but I haven't verified that. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Newer milestone thread | Uncwilly | Data | 3661 | 2023-03-05 14:03 |
Newer X64 build needed | Googulator | Msieve | 75 | 2022-06-13 14:22 |
Performance of cuda-ecm on newer hardware? | fivemack | GMP-ECM | 14 | 2015-02-12 20:10 |
Core i5 2500K vs Core i7 2600K (Linear algebra phase) | em99010pepe | Hardware | 0 | 2011-11-11 15:18 |
Use of large memory pages possible with newer linux kernels | Dresdenboy | Software | 3 | 2003-12-08 14:47 |