mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-11-22, 13:32   #12
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

6510 Posts
Default

Hi,

Quote:
Originally Posted by Prime95 View Post
You lost me completely! I'm guessing you think the 4*XeonMP should be CPU 0,1 and CPU 2,3, etc. I can't figure out what you are doing on the 2*L5420 and what you think it should be doing.
I think, that isn't optimal. I made some tests with my Q9450 Quadcore, 2 *2 DualCores.

With 4 independent Worker-threads I'm getting for M46336001 LL-Test:
Code:
4 Workers / 1 Thread  [0],[1],[2],[3]
W1-W4 = ~54ms
With 2 Workerthreads
Code:
2 Workers / 2 Threads [0,1], [2,3]
w1 32ms
w2 32ms
But, wenn I set the first worker to #2 (Core 1)
P95 set affinity to
Code:
[Nov 22 14:04] Worker starting
[Nov 22 14:04] Setting affinity to run worker on logical CPU #3
[Nov 22 14:04] Setting affinity to run helper thread 1 on logical CPU #4
But there is no core 4.... So the affinity isn't set at all for this thread. And it's mostly running in core 0

Code:
2 Workers / 2 Threads [1,2], [3,0]
W1 27ms
w2 27ms
So this settings is the fastest for me.

as far as i know, core 0 and core 1 sharing one C2-Die, core 2 and core 3 the other CD-Die.

So it seems, that running one worker on differnet C2-Dies is faster.
I can't proof it, but [1,2] [0,3] should produce the same result as [0,2], [1,3].

I would prefer a new option to set something like a core-increment.

a Core-Increment of one would assign the core+(n*1) to the next n'th workerthread
[0,1, ...]
[2,3, ...]
a Core-Increment of two
[0,2 ...]
[1,3 ...]

Last fiddled with by Phantomas on 2008-11-22 at 13:36
Phantomas is offline   Reply With Quote
Old 2008-11-22, 13:45   #13
Meikel
 
Nov 2008

32 Posts
Default

Hello Phantomas,

I think, you need to take into account, that Prime95's output for the helper threads and for the main threads at least seem to be different. Helper-Threads seem to number the CPUs starting with 1, (=quad core would have CPU numbers 1,2,3,4), while the main threads start numbering with 0 (=quad core has CPU numbers 0,1,2,3)... At least, this would explain existence of the ominous "CPU 4" on a quad-core, that only appears for the helper thread output.

Taken this uncertainty into account, please watch the Performance-tab of the task manager, when playing with CPU assignments. There you can see CPUs, when they are idling, and draw some more conclusions. This way, you might get behind the logic of your measurements :-)

I am very glad about the way George decided to separate the workloads for 25.8, but still it might make sense to allow users to hand-tune worker and helper thread affinity via GUI or via INI-File. I can only try things and measure with the Core i7 (and my ancient Athlon X2), but with the Xeons it might be totally different.

With best regards to you all,
Michael
Meikel is offline   Reply With Quote
Old 2008-11-22, 14:17   #14
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

4116 Posts
Default

Hi Meikel,

Quote:
Originally Posted by Meikel View Post
I think, you need to take into account, that Prime95's output for the helper threads and for the main threads at least seem to be different. Helper-Threads seem to number the CPUs starting with 1, (=quad core would have CPU numbers 1,2,3,4), while the main threads start numbering with 0 (=quad core has CPU numbers 0,1,2,3)... At least, this would explain existence of the ominous "CPU 4" on a quad-core, that only appears for the helper thread output.
Yes, i know that, im refering to Core Numbers 0 to 3 only.
This numbering is also used in the Logfile in the "Setting affinity..." line.
The CPU #n starts at #0, and so should end with #3. But it shows a #3 & #4 when the first core is set to the highest core.

If is set 4 threads for one worker, an set the CPU-affinity to #4, hence core 3, it outputs CPU's #3,#4,#5#,#6. So theres is a bug. (Missing modulo number of cores ?)

I'm not refferencing to the numbering in the "Worker Window" -> "CPU affinity". This field uses a range from 1 to 4.


Quote:
please watch the Performance-tab
I did that ....

Quote:
I am very glad about the way George decided to separate the workloads for 25.8
I must have missed that in the forum....
As long as I can bind the workerthreads to something else then only the next core, it will be fine for me

regards, Jörg
Phantomas is offline   Reply With Quote
Old 2008-11-22, 15:21   #15
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

201278 Posts
Default

Quote:
Originally Posted by Phantomas View Post
[Nov 22 14:04] Setting affinity to run helper thread 1 on logical CPU #4
[/code]But there is no core 4.... So the affinity isn't set at all for this thread. And it's mostly running in core 0
That's a bug. I'll fix it.

Quote:
So it seems, that running one worker on differnet C2-Dies is faster.
I can't proof it, but [1,2] [0,3] should produce the same result as [0,2], [1,3].

I would prefer a new option to set something like a core-increment.
No core-increment feature until we get more data.
The default with 25.8 on your machine will be [0,1] and [2,3].
Helper threads are assigned to the next logical CPU, so you can test your theory in 25.8 by setting Worker 1 to run on CPU 1 and Worker 2 to run on CPU 3. This should get you [1,2] and [3,any].

Please keep the data coming. Data from quad cores, hyperthreaded dual cores, hyperthreaded quad cores, and multi-CPU systems are of the most interest.
Prime95 is offline   Reply With Quote
Old 2008-11-22, 16:31   #16
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5·13 Posts
Default

Quote:
Originally Posted by Prime95 View Post
That's a bug. I'll fix it.

No core-increment feature until we get more data.
Please keep the data coming. Data from quad cores, hyperthreaded dual cores, hyperthreaded quad cores, and multi-CPU systems are of the most interest.
I'll try my very best
May be, I can get access to a hyperthreaded dual, and/or a multi CPU system....
Phantomas is offline   Reply With Quote
Old 2008-11-22, 19:00   #17
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

827910 Posts
Default

Quote:
Originally Posted by Prime95 View Post
No core-increment feature until we get more data.
I give in. Look for AffinityScramble option in next undoc.txt
Prime95 is offline   Reply With Quote
Old 2008-11-22, 19:32   #18
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

10000012 Posts
Default

Quote:
I give in. Look for AffinityScramble option in next undoc.txt


Thank you George, happy to hear that!!
Phantomas is offline   Reply With Quote
Old 2008-11-23, 12:03   #19
Meikel
 
Nov 2008

916 Posts
Default

Quote:
Please keep the data coming. Data from quad cores, hyperthreaded dual cores, hyperthreaded quad cores, and multi-CPU systems are of the most interest.
Your wish is my command! I'll be happy to test whatever possible on the i7 as soon as 25.8 is out.
Meikel is offline   Reply With Quote
Old 2008-11-23, 22:29   #20
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

I put up a sneak peek of 25.8 for 32-bit Windows
Prime95 is offline   Reply With Quote
Old 2008-11-24, 00:44   #21
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5×13 Posts
Default v25.8 Tests

Hi George,

I did some testing with the new P95 v25.8

Quadcore Q9450 (4 Cores, no HT) / WXP - Pro
2 Workers / 2 Threads 2560K-LL-Test

AffinityScramble=0123
33ms

AffinityScramble=0213
33ms

AffinityScramble=1230
AffinityScramble=2103
AffinityScramble=0312
27 ms

Thats weird!!!

As far as I know, cores on different dies have to communicate via the FSB, and this should be somewhat slower then communication to the neighbor core.

So I asume, that core 1+2 and 0+3 are sharing one Core2Duo.

But this dosn't match to the corenumbering scheme I remember and found on
http://img261.imageshack.us/img261/4...mberingtp3.jpg

Fortunately, you didn't implement it like I suggested with a Core-Increment :-)

Last fiddled with by Phantomas on 2008-11-24 at 00:48
Phantomas is offline   Reply With Quote
Old 2008-11-24, 08:13   #22
Sandman192
 
Nov 2008

32 Posts
Smile May have found all your problems with CPU problem.

When started the program it ask me how much memory I want to use, which is a Gig so I put 1000 Mb but it says it can only use 921 so I went with that. The first program that it started took verry little ram but when it came to another type of work it used 921 of ram which the task manger shows 1.13 GBs used so I thout may be it was taking to much ram so I put 800 to give the ram some room and it worked. So instead of the CPU going down more then up, it is back up to 100%. It sounds like a bug in the prime program.
Sandman192 is offline   Reply With Quote
Reply



All times are UTC. The time now is 14:05.


Fri Jul 7 14:05:55 UTC 2023 up 323 days, 11:34, 0 users, load averages: 1.27, 1.18, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔