![]() |
|
|
#1 |
|
Oct 2008
22 Posts |
I just upgraded from v24 to 25.7, build 3.
Running an Intel T2500 @ 2Ghz, 2 core WinXPSP2 Upgraded from 2 instances of p95 to one, with 2 worker threads. Where my old instances used to keep the CPU pegged near 100% between the 2 of them, the new version isn't using very much at all. I found that I had a Throttle command in my prime95.txt from before, but I removed it. Why isn't p95 using my CPU?? |
|
|
|
|
|
#2 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
3·7·17·31 Posts |
What numbers (range) and type of work are you doing?
Are both workers running on 1 number, or are each doing their own? I found that 2 workers on factoring a single number has ~55% the throughput of 2 workers factoring 2 different numbers. |
|
|
|
|
|
#3 | |
|
Oct 2008
Riga, Latvia
B16 Posts |
Quote:
On other box I have 2*L5420 (Quad core), and i have same throughput for: 4*V24, with affinity 0,3,4,6 and 2*V24+1*25(2 threads), making double check, but if I use any other core combination (not 0,3,4,6) - througput is dropped by ~30% at least :( |
|
|
|
|
|
|
#4 |
|
Oct 2008
22 Posts |
Factoring 2 numbers in the 4X,XXX,XXX range, each doing their own.
|
|
|
|
|
|
#5 |
|
Oct 2008
22 Posts |
Hmm. It looks like the Throttle command came back or I didn't kill it completely somehow. Removed it again, and now things seem fine.
|
|
|
|
|
|
#6 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
|
|
|
|
|
|
|
#7 | |
|
Oct 2008
Riga, Latvia
11 Posts |
Quote:
If I try to set (ThreadsPerTest=2) - tu use all 8 logical CPU for 4 workers I need to set affinity to 0,1,4,5 (prime95 then select 2,3,6,7 for helper threads) - in this case I have 100% CPU and, approx, ~0.047s per iteration, so, no advantage from helper thread. I want to try set main thread to CPU0 and helper thread to CPU1, but can't see, how. On 2*L5420 picture is more complex :). Again, i do not know why, but if I run just one worker with ThreadsPerTest=1 - i have ~0.028 per iteration. If one (or more) workers added - throughput go down, but very differently, depending on affinity. 4 worker thread combinations can be down to ~0.055s in "bad" combination, or can be ~0.030 for "good". At this moment I have 3 workers with foillowing params: Worker No. Threads CPU timing 1 1 0 0.029 2 2 3,4 0.016 3 3 6 0.031 Any other combination give ~0.042 per iteration for, at least, 2 workers. I do not know, why, "it's magic" ;) |
|
|
|
|
|
|
#8 |
|
Nov 2008
San Luis Obispo CA
27 Posts |
The Xeon 5400 Series Quad-Core CPUs are all twin dual-core dies. Each die (or pair of cores) contains 6 MB of the total 12 MB L2 cache:
http://download.intel.com/design/xeo...pdt/318585.pdf Quad-core desktop CPUs are similar. Hence, threads on cores 0 and 1 compete for the same cache, but do not compete for the cache shared by cores 2 and 3. Threads running (as you state) on cores 0/3/4/6 would all have unique cache and would not compete. But 0/2/4/6 should equally not compete. The unknown is your OS and other processes, which may be cache- or CPU- intensive. Without knowing all these unknowns it certainly could seem like "magic". You may be able to set affinity for some of the system threads to increase performance. |
|
|
|
|
|
#9 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
205716 Posts |
Quote:
|
|
|
|
|
|
|
#10 |
|
Nov 2008
32 Posts |
Hi George,
then I can, by trial, prove your documentation wrong - at least on Win Vista 64Bit and Core i7 940, and I agree with Oleg V.Cat. That's what I did: First, tell Prime95 to run on any cpu. Then start an LL-Test with 4 threads. Use Task Manager to bind Prime95 manually to CPUs. Binding to CPUs 0,1,2,3: Best per iteration time: 11.435ms Binding to CPUs 0,2,4,6: Best per iteration time: 6.833ms Seems quite obvious to me, no? CPUs 0,2,4,6 are "real". Definitively. Maybe Intel uses a different numbering scheme than Microsoft? Whatever: The measurements don't lie. IMHO best performance for a Core i7 could be reached by assigning 4 LL tests to it, starting two threads for each LL test and assigning the first test to the first real CPU-HT-pair (0,1), the second to (2,3), and so on. I also agree with Oleg on that. So, I'm pretty convinced that your proposed assignment strategy for 25.8 would totally NOT work well :-) Last fiddled with by Meikel on 2008-11-22 at 01:57 |
|
|
|
|
|
#11 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
|
|
|
|