mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2006-01-19, 21:45   #1
Nicky McLean
 
Jan 2006
Lower Hutt, New Zealand.

3 Posts
Question Getting the most out of the crunch...

I'm flogging an HP workstation xw6200, a Xeon dual cpu 3200MHz with 1GB of memory and hyperthreading activated so that there are "four" cpus. Also crunching are distributed.net's happy cow, and climateprediction's model.exe and I'm watching it all with TaskInfo2002
So with one instance of Prime95 running and processor affinity set to cpu3, I see cpu3's graphic showing full occupancy (all green) while but the cpu utilisation of Prime95 is not given as 25% (or 24.9) but around 16% and the system idle time is around 35%
With the processor affinity decativated (and a stop/continue), no cpu window is filled with green, but the cpu usage number for Prime95 is now 25% as it is for distributed.net and model.exe.
But looking at the log offered by Prime95, I see that in the affinity state, M31313641 stage 2 is ~% complete steps with times of 2334, 1898 and 1944 secs, but with the affinity feature not selected, 1143 and 1167 secs timing.

This suggests that I should obtain faster crunching with the affinity feature inactive, which is not as suggested.
Nicky McLean is offline   Reply With Quote
Old 2006-01-19, 23:27   #2
moo
 
moo's Avatar
 
Jul 2004
Nowhere

14518 Posts
Default

you dont have 4 cpus you have 4 virtual cpus there is a big difference just because it shows 2x the ammount of phisical cpus doesnt mean it does 200 precent more work. try stoping all your other clients and run prime95 on proc 0 or 1
moo is offline   Reply With Quote
Old 2006-01-25, 04:10   #3
Nicky McLean
 
Jan 2006
Lower Hutt, New Zealand.

3 Posts
Default Debasement of the coinage.

Yes I know that there are only two real cpus, each with extra registers that allow them to simulate two cpus each, and all of them are contending for the same data path to memory, which is severly slower than the internal memory.
However, tests have shown that although an additional cruncher slows all crunchers (because of additional contention for memory access), nevertheless the nett throughput does increase because there are now more crunchers active. In other words, the slowing fator was less than the additional cruncher factor. This was the case with the seticrunch (in four instances), but I haven't had time to run similar tests with the Mersenne prime cruncher.
Nicky McLean is offline   Reply With Quote
Old 2006-01-25, 05:20   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1175610 Posts
Default

If the application in question completely saturates the FPU (as does Prime95), running more instances than there are physical CPUs (i.e. FPUs) will not (in fact cannot) increase throughput.
ewmayer is offline   Reply With Quote
Old 2006-01-25, 05:28   #5
moo
 
moo's Avatar
 
Jul 2004
Nowhere

809 Posts
Default

Hmm its like you cant get 2 tomatos by cutting a tomato that is 1.5 times larger then a regular tomato in half....
moo is offline   Reply With Quote
Old 2006-01-25, 06:10   #6
axn
 
axn's Avatar
 
Jun 2003

155816 Posts
Default

How does the affinity feature work? If we set affinity, will the OS schedule the process _only_ on that particular CPU or will it try to schedule _as much as possible_ on that particular CPU?


@Nicky McLean: Have you tried setting affinity to either 0 or 1? AFAIK, CPU 0 & 1 are the "real" CPUs. CPU 2 & 3 are "virtual" CPUs.

Last fiddled with by axn on 2006-01-25 at 06:10
axn is offline   Reply With Quote
Old 2006-01-31, 23:42   #7
Nicky McLean
 
Jan 2006
Lower Hutt, New Zealand.

3 Posts
Default Affinity...

I tried setting the affinity, and indeed one cpu only of the "four" was active. I haven't tried a variety of tests to report further as to the behaviour with other crunchers running or not.
I guessed that 0 and 1 were the first real cpu, with 2 and 3 the second, but I have no basis for this and axn1's thought is just as valid a possibility. One imagines that each real cpu has its own real floating-point circuitry, but that there are not four of them, merely the appearance of four. Lots of circuitry in a fpu to accommodate all the little tricks to hasten its crunching (and be bungled in the pentium bug) so two sets (in each real cpu) seems less likely, though there may be two sets of registers (in each) to facilitate switching.

To supply some specifics: based on the (now discontinued) seti@home command-line cruncher running on the same work unit, test runs went as follows:
Time Cpus Crunchers
113m07s four one
122m14s four two
172m20s four four
112m06s two one
114m07s two two

Which reduces to these production rates for the 4cpu state:

1/113 = 0·00885 x 1 = 0·00885 WU/min or ·53 WU/hour with one cruncher.
1/122 = 0·0082 x 2 = 0·0164 WU/min or ·98 WU/hour spread over 2 WUs; not 1·06WUs.
1/172 = 0·0058 x 4 = 0·023 WU/min or 1·39 WU/hour spread over 4 WUs; not 1·96 nor 2·12WUs.

Thus, despite the increased contention slowing each separate cruncher, the aggregate productuion still improves with more crunchers, and, it was better to have four cpus than two bashing the electrons about since two crunches on a two cpu system delivered 1·05WU/hr, more than the ·98 from two crunchers on a four cpu system but less than the 1·39 of four crunchers on four cpus. But diminishing returns.
Tests with a heterogenous collection of crunchers will take more patience, more still where an "affinity" options exist, and, if cpus 0 and 1 differ from 2 and 3, even more.
In the seti@home case, more crunchers meant more production (and in other discussion this seemed to be so for many different systems), but this is not always the case. Other crunchers are different, with different patterns of FP action versus non-FP and so on. Fancy cpus advance the computation along a broad front, with many op-codes in various stages of progress at any given time, interacting with on-chip registers and memory at various levels, and fighting for access to the data transfer bandwidth.
Only explicit testing will resolve questions, for particular programmes on particular machines, and much patience is consumed.
Nicky McLean is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Should we continue to crunch after an error occurs?? outlnder Hardware 9 2003-02-12 10:13

All times are UTC. The time now is 16:11.


Fri Jul 7 16:11:16 UTC 2023 up 323 days, 13:39, 0 users, load averages: 1.31, 1.35, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔