![]() |
|
|
#12 | |
|
Oct 2008
Germany, Hamburg
6510 Posts |
Hi,
Quote:
With 4 independent Worker-threads I'm getting for M46336001 LL-Test: Code:
4 Workers / 1 Thread [0],[1],[2],[3] W1-W4 = ~54ms Code:
2 Workers / 2 Threads [0,1], [2,3] w1 32ms w2 32ms P95 set affinity to Code:
[Nov 22 14:04] Worker starting [Nov 22 14:04] Setting affinity to run worker on logical CPU #3 [Nov 22 14:04] Setting affinity to run helper thread 1 on logical CPU #4 Code:
2 Workers / 2 Threads [1,2], [3,0] W1 27ms w2 27ms as far as i know, core 0 and core 1 sharing one C2-Die, core 2 and core 3 the other CD-Die. So it seems, that running one worker on differnet C2-Dies is faster. I can't proof it, but [1,2] [0,3] should produce the same result as [0,2], [1,3]. I would prefer a new option to set something like a core-increment. a Core-Increment of one would assign the core+(n*1) to the next n'th workerthread [0,1, ...] [2,3, ...] a Core-Increment of two [0,2 ...] [1,3 ...] Last fiddled with by Phantomas on 2008-11-22 at 13:36 |
|
|
|
|
|
|
#13 |
|
Nov 2008
32 Posts |
Hello Phantomas,
I think, you need to take into account, that Prime95's output for the helper threads and for the main threads at least seem to be different. Helper-Threads seem to number the CPUs starting with 1, (=quad core would have CPU numbers 1,2,3,4), while the main threads start numbering with 0 (=quad core has CPU numbers 0,1,2,3)... At least, this would explain existence of the ominous "CPU 4" on a quad-core, that only appears for the helper thread output. Taken this uncertainty into account, please watch the Performance-tab of the task manager, when playing with CPU assignments. There you can see CPUs, when they are idling, and draw some more conclusions. This way, you might get behind the logic of your measurements :-) I am very glad about the way George decided to separate the workloads for 25.8, but still it might make sense to allow users to hand-tune worker and helper thread affinity via GUI or via INI-File. I can only try things and measure with the Core i7 (and my ancient Athlon X2), but with the Xeons it might be totally different. With best regards to you all, Michael |
|
|
|
|
|
#14 | |||
|
Oct 2008
Germany, Hamburg
4116 Posts |
Hi Meikel,
Quote:
This numbering is also used in the Logfile in the "Setting affinity..." line. The CPU #n starts at #0, and so should end with #3. But it shows a #3 & #4 when the first core is set to the highest core. If is set 4 threads for one worker, an set the CPU-affinity to #4, hence core 3, it outputs CPU's #3,#4,#5#,#6. So theres is a bug. (Missing modulo number of cores ?) I'm not refferencing to the numbering in the "Worker Window" -> "CPU affinity". This field uses a range from 1 to 4. Quote:
....Quote:
As long as I can bind the workerthreads to something else then only the next core, it will be fine for me ![]() regards, Jörg |
|||
|
|
|
|
|
#15 | ||
|
P90 years forever!
Aug 2002
Yeehaw, FL
201278 Posts |
Quote:
Quote:
The default with 25.8 on your machine will be [0,1] and [2,3]. Helper threads are assigned to the next logical CPU, so you can test your theory in 25.8 by setting Worker 1 to run on CPU 1 and Worker 2 to run on CPU 3. This should get you [1,2] and [3,any]. Please keep the data coming. Data from quad cores, hyperthreaded dual cores, hyperthreaded quad cores, and multi-CPU systems are of the most interest. |
||
|
|
|
|
|
#16 | |
|
Oct 2008
Germany, Hamburg
5·13 Posts |
Quote:
![]() May be, I can get access to a hyperthreaded dual, and/or a multi CPU system.... |
|
|
|
|
|
|
#17 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
827910 Posts |
|
|
|
|
|
|
#18 | |
|
Oct 2008
Germany, Hamburg
10000012 Posts |
Quote:
Thank you George, happy to hear that!! |
|
|
|
|
|
|
#19 | |
|
Nov 2008
916 Posts |
Quote:
|
|
|
|
|
|
|
#20 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
I put up a sneak peek of 25.8 for 32-bit Windows
|
|
|
|
|
|
#21 |
|
Oct 2008
Germany, Hamburg
5×13 Posts |
Hi George,
I did some testing with the new P95 v25.8 Quadcore Q9450 (4 Cores, no HT) / WXP - Pro 2 Workers / 2 Threads 2560K-LL-Test AffinityScramble=0123 33ms AffinityScramble=0213 33ms AffinityScramble=1230 AffinityScramble=2103 AffinityScramble=0312 27 ms Thats weird!!! As far as I know, cores on different dies have to communicate via the FSB, and this should be somewhat slower then communication to the neighbor core. So I asume, that core 1+2 and 0+3 are sharing one Core2Duo. But this dosn't match to the corenumbering scheme I remember and found on http://img261.imageshack.us/img261/4...mberingtp3.jpg Fortunately, you didn't implement it like I suggested with a Core-Increment :-) Last fiddled with by Phantomas on 2008-11-24 at 00:48 |
|
|
|
|
|
#22 |
|
Nov 2008
32 Posts |
When started the program it ask me how much memory I want to use, which is a Gig so I put 1000 Mb but it says it can only use 921 so I went with that. The first program that it started took verry little ram but when it came to another type of work it used 921 of ram which the task manger shows 1.13 GBs used so I thout may be it was taking to much ram so I put 800 to give the ram some room and it worked. So instead of the CPU going down more then up, it is back up to 100%. It sounds like a bug in the prime program.
|
|
|
|