![]() |
|
|
#1 |
|
Jan 2006
Lower Hutt, New Zealand.
3 Posts |
I'm flogging an HP workstation xw6200, a Xeon dual cpu 3200MHz with 1GB of memory and hyperthreading activated so that there are "four" cpus. Also crunching are distributed.net's happy cow, and climateprediction's model.exe and I'm watching it all with TaskInfo2002
So with one instance of Prime95 running and processor affinity set to cpu3, I see cpu3's graphic showing full occupancy (all green) while but the cpu utilisation of Prime95 is not given as 25% (or 24.9) but around 16% and the system idle time is around 35% With the processor affinity decativated (and a stop/continue), no cpu window is filled with green, but the cpu usage number for Prime95 is now 25% as it is for distributed.net and model.exe. But looking at the log offered by Prime95, I see that in the affinity state, M31313641 stage 2 is ~% complete steps with times of 2334, 1898 and 1944 secs, but with the affinity feature not selected, 1143 and 1167 secs timing. This suggests that I should obtain faster crunching with the affinity feature inactive, which is not as suggested. |
|
|
|
|
|
#2 |
|
Jul 2004
Nowhere
14518 Posts |
you dont have 4 cpus you have 4 virtual cpus there is a big difference just because it shows 2x the ammount of phisical cpus doesnt mean it does 200 precent more work. try stoping all your other clients and run prime95 on proc 0 or 1
|
|
|
|
|
|
#3 |
|
Jan 2006
Lower Hutt, New Zealand.
3 Posts |
Yes I know that there are only two real cpus, each with extra registers that allow them to simulate two cpus each, and all of them are contending for the same data path to memory, which is severly slower than the internal memory.
However, tests have shown that although an additional cruncher slows all crunchers (because of additional contention for memory access), nevertheless the nett throughput does increase because there are now more crunchers active. In other words, the slowing fator was less than the additional cruncher factor. This was the case with the seticrunch (in four instances), but I haven't had time to run similar tests with the Mersenne prime cruncher. |
|
|
|
|
|
#4 |
|
∂2ω=0
Sep 2002
República de California
1175610 Posts |
If the application in question completely saturates the FPU (as does Prime95), running more instances than there are physical CPUs (i.e. FPUs) will not (in fact cannot) increase throughput.
|
|
|
|
|
|
#5 |
|
Jul 2004
Nowhere
809 Posts |
Hmm its like you cant get 2 tomatos by cutting a tomato that is 1.5 times larger then a regular tomato in half....
|
|
|
|
|
|
#6 |
|
Jun 2003
155816 Posts |
How does the affinity feature work? If we set affinity, will the OS schedule the process _only_ on that particular CPU or will it try to schedule _as much as possible_ on that particular CPU?
@Nicky McLean: Have you tried setting affinity to either 0 or 1? AFAIK, CPU 0 & 1 are the "real" CPUs. CPU 2 & 3 are "virtual" CPUs. Last fiddled with by axn on 2006-01-25 at 06:10 |
|
|
|
|
|
#7 |
|
Jan 2006
Lower Hutt, New Zealand.
3 Posts |
I tried setting the affinity, and indeed one cpu only of the "four" was active. I haven't tried a variety of tests to report further as to the behaviour with other crunchers running or not.
I guessed that 0 and 1 were the first real cpu, with 2 and 3 the second, but I have no basis for this and axn1's thought is just as valid a possibility. One imagines that each real cpu has its own real floating-point circuitry, but that there are not four of them, merely the appearance of four. Lots of circuitry in a fpu to accommodate all the little tricks to hasten its crunching (and be bungled in the pentium bug) so two sets (in each real cpu) seems less likely, though there may be two sets of registers (in each) to facilitate switching. To supply some specifics: based on the (now discontinued) seti@home command-line cruncher running on the same work unit, test runs went as follows: Time Cpus Crunchers 113m07s four one 122m14s four two 172m20s four four 112m06s two one 114m07s two two Which reduces to these production rates for the 4cpu state: 1/113 = 0·00885 x 1 = 0·00885 WU/min or ·53 WU/hour with one cruncher. 1/122 = 0·0082 x 2 = 0·0164 WU/min or ·98 WU/hour spread over 2 WUs; not 1·06WUs. 1/172 = 0·0058 x 4 = 0·023 WU/min or 1·39 WU/hour spread over 4 WUs; not 1·96 nor 2·12WUs. Thus, despite the increased contention slowing each separate cruncher, the aggregate productuion still improves with more crunchers, and, it was better to have four cpus than two bashing the electrons about since two crunches on a two cpu system delivered 1·05WU/hr, more than the ·98 from two crunchers on a four cpu system but less than the 1·39 of four crunchers on four cpus. But diminishing returns. Tests with a heterogenous collection of crunchers will take more patience, more still where an "affinity" options exist, and, if cpus 0 and 1 differ from 2 and 3, even more. In the seti@home case, more crunchers meant more production (and in other discussion this seemed to be so for many different systems), but this is not always the case. Other crunchers are different, with different patterns of FP action versus non-FP and so on. Fancy cpus advance the computation along a broad front, with many op-codes in various stages of progress at any given time, interacting with on-chip registers and memory at various levels, and fighting for access to the data transfer bandwidth. Only explicit testing will resolve questions, for particular programmes on particular machines, and much patience is consumed. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Should we continue to crunch after an error occurs?? | outlnder | Hardware | 9 | 2003-02-12 10:13 |