mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2008-11-24, 04:48   #1
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·52·71 Posts
Default Did I read a post re 4 cores to NOT LL all?

I seem to recall reading on this forum that on a Quad it is not the best use of the PC to do LL tests on all 4 cores; something about overhead on the CPU? or RAM? I think it recommended doing TF on one core?

If I am not imagining things then can someone tell me what the conditions were? i.e. Does it depend on the CPU or RAM Technology or Speed or the OS?

I have an Intel Q9550 (Quad core 2.83 Ghz)
with 4GB DDR1066 RAM
and Vista 64.
petrw1 is offline   Reply With Quote
Old 2008-11-24, 05:30   #2
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

3×7×17×31 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I seem to recall reading on this forum that on a Quad it is not the best use of the PC to do LL tests on all 4 cores; something about overhead on the CPU? or RAM? I think it recommended doing TF on one core?

If I am not imagining things then can someone tell me what the conditions were? i.e. Does it depend on the CPU or RAM Technology or Speed or the OS?
The limitations in accessing RAM (irrespective of tech.) cause the multiple threads to compete and choke down the through put. With a quad, I personally would suggest at least 1, maybe 2 threads doing T-F. T-F puts less demand on the bus. Others may offer their own suggestions based upon actual experience, but that is my dos centavos.
Uncwilly is online now   Reply With Quote
Old 2008-11-24, 06:07   #3
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·52·71 Posts
Default

You may be right....I swapped one core to TF and the other three doing LL (though still in the P-1 phase) are now running at least 10% faster
petrw1 is offline   Reply With Quote
Old 2008-11-24, 07:52   #4
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5·13 Posts
Default

Hi petrw1,

I think you should give the new v25.8 a try with AffinityScrambling set to AffinityScramble=1230 an 2 worker / 2 helper threads for LL-Tests

see http://mersenneforum.org/showpost.ph...7&postcount=21
Phantomas is offline   Reply With Quote
Old 2008-11-24, 11:37   #5
Kevin
 
Kevin's Avatar
 
Aug 2002
Ann Arbor, MI

6618 Posts
Default

With your memory, there's not as much of a performance hit as most people get when you run LL tests on all 4 cores. I think it's along the lines of with DDR2-800 or below, you only get something like 3 cores worth of output, but with DDR2-1066 it's closer to 3.5 cores worth of output. It's not big enough of a hit to discourage me from running LL on all 4 cores of my two quad-cores.

Whether it's "better" to run LL on all 4 cores or LL-TF-LL-TF is somewhat a matter of opinion. If you just care about total production, then with the current crediting system you're best off alternating. Back when TF only got credited 1/10 as much, it was worth the performance hit to run LL on all 4 cores. If you care about advancing the project, then I believe that LL testing is the way to go. I'm pretty sure the crediting disparity was originally created to motivate LL testing over factoring, and I think with that disparity gone we're going to eventually see the TF leading edge pull away from the LL testing leading edge.
Kevin is offline   Reply With Quote
Old 2008-11-24, 16:11   #6
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

532510 Posts
Default

Quote:
Originally Posted by Phantomas View Post
Hi petrw1,

I think you should give the new v25.8 a try with AffinityScrambling set to AffinityScramble=1230 an 2 worker / 2 helper threads for LL-Tests

see http://mersenneforum.org/showpost.ph...7&postcount=21
Do I understand correctly that this will give me 2 concurrent LL tests with 2 cores jointly working on each?

And if I do this will the Per Iteration time be close to half so that whether I do 4 LL tests on seperate cores OR 2 by 2 cores each the total elapsed time for 4 tests will be about the same?

Last fiddled with by petrw1 on 2008-11-24 at 16:11
petrw1 is offline   Reply With Quote
Old 2008-11-24, 18:50   #7
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

6510 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Do I understand correctly that this will give me 2 concurrent LL tests with 2 cores jointly working on each?
Yes, that's right. And when I interpret my results right, than each LL will run on one DualCore, so (my impression) one LL can use the 6MB L2 Cache alone, an it doesn't need to access the ordinary RAM so often.

Quote:
Originally Posted by petrw1 View Post
And if I do this will the per Iteration time be close to half so that whether I do 4 LL tests on seperate cores OR 2 by 2 cores each the total elapsed time for 4 tests will be about the same?
This is the case in my test with my Q9450. With 4 independent LL-Tests (2560K) one itteration is about 54.somewhat ms. With 2 LL-Tests with 2 cores it's about 26.somewhatelse ms. So it is in fact a little, tiny bit faster. And I assume that this is because it's using the L2 Cache better. My RAM runs at 1200MHz 6,6,6,15, and maybe the effect is bigger on 800MHz Ram's (hope so...)

But it seems to be important to run one test on Core [1,2], and the other on core [0,3]. Else my itteration time went up 20%.

Last fiddled with by Phantomas on 2008-11-24 at 18:51
Phantomas is offline   Reply With Quote
Old 2008-11-24, 20:49   #8
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego Coun

821 Posts
Default

For future reference, all of the above applies to Intel quad core processors prior to the i7 series. Most of the adjustments/tweaks listed probably do not need to be done on i7 systems (at least those with triple channel RAM) and won't have a significant effect on Phenom quad cores.
sdbardwick is offline   Reply With Quote
Old 2008-11-24, 21:56   #9
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5×13 Posts
Default

Quote:
For future reference, all of the above applies to Intel quad core processors
Yepp!
Phantomas is offline   Reply With Quote
Old 2008-11-26, 00:43   #10
ADBjester
 
Aug 2002

111102 Posts
Default

Quote:
Originally Posted by Kevin View Post
With your memory, there's not as much of a performance hit as most people get when you run LL tests on all 4 cores. I think it's along the lines of with DDR2-800 or below, you only get something like 3 cores worth of output, but with DDR2-1066 it's closer to 3.5 cores worth of output. It's not big enough of a hit to discourage me from running LL on all 4 cores of my two quad-cores.
The bandwidth of the RAM itself is not the limiting factor. The limiting factor is the memory bus itself, and contention for it by four cores. 800 Mhz QDR RAM is more than enough for what we do. Moving to 1066 Mhz won't add any more than 5% to your performance.

Of far greater concern to him is the chipset... as the nVidia chipsets have far greater problems with all four cores demand high volume access to the memory bus, whereas the Intel chipsets are much, MUCH better.

If he's running nVidia, 2 LL and 2 TF are about optimal. If he's running Intel, then 4 LL are fine -- downshifting to 3 LL and 1 TF doesn't buy you any improvement.

Jester
ADBjester is offline   Reply With Quote
Old 2008-11-26, 06:17   #11
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

2·977 Posts
Default

Quote:
Originally Posted by ADBjester View Post
The bandwidth of the RAM itself is not the limiting factor. The limiting factor is the memory bus itself, and contention for it by four cores. 800 Mhz QDR RAM is more than enough for what we do. Moving to 1066 Mhz won't add any more than 5% to your performance.
I do not agree, on P4 D and on Core2 Quad the performance of Prime95 was proportional to the memory speed (measured from 533 MHz DDR2 to 1066 MHz DDR2.)
Quote:
Originally Posted by ADBjester View Post
... the nVidia chipsets have far greater problems with all four cores demand high volume access to the memory bus, whereas the Intel chipsets are much, MUCH better.
Yes
Quote:
Originally Posted by ADBjester View Post
If he's running nVidia, 2 LL and 2 TF are about optimal. If he's running Intel, then 4 LL are fine -- downshifting to 3 LL and 1 TF doesn't buy you any improvement.
I don't agree : on the P35, 965 anf G965 chipsets running 3 LL + 1 TF, the core doing LL and sharing a die with TF sees a 6% to 12 % improvement over the cores of the die where both are doing LL tests.

Jacob
S485122 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Read-only error? Xyzzy Msieve 2 2015-11-06 01:20
PLEASE READ BEFORE POSTING! ewmayer Lounge 0 2006-04-12 18:48
I am sorry please read this meeztamike Miscellaneous Math 3 2006-01-03 01:47
chance of finding a factor?......Read me read me read me :) Firedog18 Software 9 2003-07-25 17:10
Please read!!!!! andi314 Lone Mersenne Hunters 1 2003-02-20 13:53

All times are UTC. The time now is 13:50.


Fri Jul 7 13:50:40 UTC 2023 up 323 days, 11:19, 0 users, load averages: 1.07, 1.17, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔