mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2009-02-14, 00:00   #1
mklasson
 
Feb 2004

4028 Posts
Default Newer msieves are slow on Core i7

Hi,

it seems v1.39 and v1.38 (possibly earlier versions as well) are having some kind of trouble with Core i7 processors, at least when using the QS.

They're about 20% slower than ye olde v1.16 on this cpu, as opposed to 5-10% faster on a Core (1) processor.

I take it "using generic 32kb sieve core" has something to do with it?

I've also tried the win64 version of v1.39 that Jeff Gilchrist has compiled (thank you both!) with similarly poor performance "using VC8 32kb sieve core".

I'm guessing the Core i7 would run better with the "32kb Intel Core sieve core". Is there any way, short of fiddling about and compiling it myself, to force another sieve core? If not, do you have any plans to improve the i7 performance?

[...]

I actually did try compiling it under VC++ 2008 Express, but there seems to be some dependencies on gmp in the gnfs part unless I'm missing something. Is getting gmp to compile under VC still a big mess or has that changed the last couple of years?

Looking at the code you're comparing the max supported cpuid call to 10 to determine if it's a Core cpu. The i7 apparently returns 11.

Hacking your v1.39 windows binary at offset 0x18ec to 11 instead of 10 seems to make it use the Core core on my i7. At least it prints "using 32kb Intel Core sieve core". Curiously though, it's still running at the same speed as the unmodified v1.39. Possibly very marginally faster, but then insignificantly so compared to the gap between it and v1.16.

Some more tweaking reveals that the "64kb Pentium 4 sieve core" is halfway between unmodified v1.39 and v1.16 in speed... v1.16 seems to use 64KB sieving blocks. Looks to me like the i7 doesn't really like your fancy new 32KB blocks.

Any thoughts before I waste more time? :)

cheers,
Mikael
mklasson is offline   Reply With Quote
Old 2009-02-14, 02:46   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

355910 Posts
Default

The CPU identification does need to be better. As for the performance difference, if you're in a position to compile from source try changing HAVE_CMOV in include/mp.h to HAS_CMOV. This was a typo that turned off tiny snippets of code in performance-critical inner loops everywhere in the code, so maybe it would make up some of the difference.

See the readme in the MSVC project directory for compiling without GMP and GMP-ECM.

Also, note that the code needs some tiny changes to compile with the MSVC Express compiler, and those changes are only in my local sources right now.

Last fiddled with by jasonp on 2009-02-14 at 02:55
jasonp is offline   Reply With Quote
Old 2009-02-14, 07:54   #3
10metreh
 
10metreh's Avatar
 
Nov 2008

2×33×43 Posts
Default

Quote:
Originally Posted by mklasson View Post
Hi,

it seems v1.39 and v1.38 (possibly earlier versions as well) are having some kind of trouble with Core i7 processors, at least when using the QS.

They're about 20% slower than ye olde v1.16 on this cpu, as opposed to 5-10% faster on a Core (1) processor.

I take it "using generic 32kb sieve core" has something to do with it?

I've also tried the win64 version of v1.39 that Jeff Gilchrist has compiled (thank you both!) with similarly poor performance "using VC8 32kb sieve core".

I'm guessing the Core i7 would run better with the "32kb Intel Core sieve core". Is there any way, short of fiddling about and compiling it myself, to force another sieve core? If not, do you have any plans to improve the i7 performance?

[...]

I actually did try compiling it under VC++ 2008 Express, but there seems to be some dependencies on gmp in the gnfs part unless I'm missing something. Is getting gmp to compile under VC still a big mess or has that changed the last couple of years?

Looking at the code you're comparing the max supported cpuid call to 10 to determine if it's a Core cpu. The i7 apparently returns 11.

Hacking your v1.39 windows binary at offset 0x18ec to 11 instead of 10 seems to make it use the Core core on my i7. At least it prints "using 32kb Intel Core sieve core". Curiously though, it's still running at the same speed as the unmodified v1.39. Possibly very marginally faster, but then insignificantly so compared to the gap between it and v1.16.

Some more tweaking reveals that the "64kb Pentium 4 sieve core" is halfway between unmodified v1.39 and v1.16 in speed... v1.16 seems to use 64KB sieving blocks. Looks to me like the i7 doesn't really like your fancy new 32KB blocks.

Any thoughts before I waste more time? :)

cheers,
Mikael
Have a go at Yafu. It might be faster.
10metreh is offline   Reply With Quote
Old 2009-02-14, 11:28   #4
mklasson
 
Feb 2004

2·3·43 Posts
Default

Quote:
Originally Posted by jasonp View Post
if you're in a position to compile from source try changing HAVE_CMOV in include/mp.h to HAS_CMOV.
I'm not, yet, but maybe I'll get it working...

Quote:
See the readme in the MSVC project directory for compiling without GMP and GMP-ECM.
I've tried that, but there still seems to be stuff in gnfs/poly/ that needs gmp.

Ah well, I should probably just try to find a compilable gmp.

Quote:
Originally Posted by 10metreh
Have a go at Yafu. It might be faster.
Indeed! Yafu v1.06 win64 looks about 20-25% faster than msieve v1.16 on the i7, while the win32 binary is only slightly faster than msieve. On my Core 1 the roles are reversed with msieve v1.16 taking a 15% lead over yafu win32.
mklasson is offline   Reply With Quote
Old 2009-02-14, 13:21   #5
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

117310 Posts
Default

Quote:
Originally Posted by mklasson View Post
I'm not, yet, but maybe I'll get it working...

I've tried that, but there still seems to be stuff in gnfs/poly/ that needs gmp.
Compiling GMP on Windows is fairly painless thanks to Brian Gladman. Download the latest GMP from the main website, then download the MSVC project files from Brian Gladman's site and extract them into the gmp-4.2.4/ directory and read the instructions.

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-02-14, 13:37   #6
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

I fixed the source and re-benched out of curiosity on my Q9550 Core 2. It seems the 64bit code isn't really affected by the changes.

Code:
SIQS (C80 = 43756152090407155008788902702412144383525640641502974083054213255054353547943661)
=============================================================================================
YAFU 1.06 64bit MSVC   =  4m 02.683s
YAFU 1.06 32bit gcc-k8 =  4m 57.278s
MSIEVE 1.39 32bit gcc  =  5m 01.313s [HAS_CMOV]
MSIEVE 1.39 32bit gcc  =  5m 41.537s [HAVE_CMOV]
MSIEVE 1.39 64bit MSVC =  6m 07.721s [HAS_CMOV]
MSIEVE 1.39 64bit MSVC =  6m 08.474s [HAVE_CMOV]

SIQS (C75 = 281396163585532137380297959872159569353696836686080935550459706878100362721)
========================================================================================
YAFU 1.06 64bit MSVC   =  1m 36.408s
MSIEVE 1.39 32bit gcc  =  1m 57.033s [HAS_CMOV]
YAFU 1.06 32bit gcc-k8 =  2m 00.339s
MSIEVE 1.39 32bit gcc  =  2m 06.055s [HAVE_CMOV]
MSIEVE 1.39 64bit MSVC =  2m 16.717s [HAS_CMOV]
MSIEVE 1.39 64bit MSVC =  2m 18.905s [HAVE_CMOV]
So the fix seems to have made a big difference on the C80 and a noticeable one of the C75 (pushing the 32bit version ahead of 32bit YAFU).

Last fiddled with by Jeff Gilchrist on 2009-02-14 at 13:39
Jeff Gilchrist is offline   Reply With Quote
Old 2009-02-17, 00:53   #7
mklasson
 
Feb 2004

1000000102 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
Compiling GMP on Windows is fairly painless thanks to Brian Gladman. Download the latest GMP from the main website, then download the MSVC project files from Brian Gladman's site and extract them into the gmp-4.2.4/ directory and read the instructions.
Great, thanks! I notice his readme suggests VS2008 Pro or higher. From what I've been able to read the differences between Standard and Pro are pretty insignificant unless maybe you're in a big corporate setting with a million coworkers all doing very strange things. I suspect MS feel the same way as they seem to have pulled the once-existing page describing the differences explicitly... :)

There's a substantial price hike between the Std and Pro versions though. Does anyone know of a good reason to go with the Pro?

It's a shame you can't buy only VC++ anymore. I have no use whatsoever for 80+ % of the VS content...
mklasson is offline   Reply With Quote
Old 2009-02-17, 10:53   #8
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

49516 Posts
Default

Quote:
Originally Posted by mklasson View Post
Great, thanks! I notice his readme suggests VS2008 Pro or higher. From what I've been able to read the differences between Standard and Pro are pretty insignificant unless maybe you're in a big corporate setting with a million coworkers all doing very strange things. I suspect MS feel the same way as they seem to have pulled the once-existing page describing the differences explicitly... :)
The free Express edition can't compile 64bit code, but like you said I think most of the features the average developer needs is with the Standard edition.
Jeff Gilchrist is offline   Reply With Quote
Old 2009-02-17, 11:21   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2×7×461 Posts
Default

Quote:
Originally Posted by Jeff Gilchrist View Post
I fixed the source and re-benched out of curiosity on my Q9550 Core 2. It seems the 64bit code isn't really affected by the changes.
I wouldn't expect the 64-bit code to be affected, since there are no 64-bit processors without CMOV.
fivemack is offline   Reply With Quote
Old 2009-02-18, 12:58   #10
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,559 Posts
Default

I know what happened too; 32-bit x86 uses explicit CMOV instructions, and the asm statement that inserted them was modified when akruppa needed to fix a bug in GMP-ECM. In that library the asm statement was guarded with HAVE_CMOV, and I inadvertently copied that when I cut-n-pasted his version into msieve a few releases ago.

64-bit x86 uses generic C, and will use CMOV instructions anyway. It's possible that recent gcc will also use CMOVs without special prompting, but I haven't verified that.
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newer milestone thread Uncwilly Data 3661 2023-03-05 14:03
Newer X64 build needed Googulator Msieve 75 2022-06-13 14:22
Performance of cuda-ecm on newer hardware? fivemack GMP-ECM 14 2015-02-12 20:10
Core i5 2500K vs Core i7 2600K (Linear algebra phase) em99010pepe Hardware 0 2011-11-11 15:18
Use of large memory pages possible with newer linux kernels Dresdenboy Software 3 2003-12-08 14:47

All times are UTC. The time now is 18:24.


Sun Mar 26 18:24:00 UTC 2023 up 220 days, 15:52, 0 users, load averages: 1.22, 1.26, 1.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔