![]() |
Setting affinity should not be necessary
|
[QUOTE=Prime95;553822]Setting affinity should not be necessary[/QUOTE]
Yes, but please answer. I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct? |
[QUOTE=pepi37;553921]Yes, but please answer.
I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct?[/QUOTE] You might actually get better performance if you cut this back to four workers: [Worker #1] Affinity=0,(1,2) [Worker #2] Affinity=3,(4,5) [Worker #3] Affinity=6,(7,8) [Worker #4] Affinity=9,(A,B) 4 worker with 2 helpers each. :smile: |
[QUOTE=pepi37;553921]Yes, but please answer.
I ask this since this CPU have 12 cores Can I use this scheme (under linux) AffinityScramble2=0123456789ABCDEFGHIJKLMNOPQRSTUV Worker #1] Affinity=0,1 [Worker #2] Affinity=2 Worker #3] Affinity=3,4 [Worker #4] Affinity=5 Worker #5] Affinity=6,7 [Worker #6] Affinity=8 Worker #7] Affinity=9,A [Worker #8] Affinity=B Is this looks correct?[/QUOTE] AffinityScramble is deprecated. Your settings are OK except "9,A" should be "9,10" and B should be 11. |
[QUOTE=storm5510;553945]You might actually get better performance if you cut this back to four workers:
[Worker #1] Affinity=0,(1,2) [Worker #2] Affinity=3,(4,5) [Worker #3] Affinity=6,(7,8) [Worker #4] Affinity=9,(A,B) 4 worker with 2 helpers each. :smile:[/QUOTE] Since there is no rush doing CRUS sequence at home I am concentrated to best output, and 4 workers with 3 core each is not that in this case. But thanks for advice |
[QUOTE=pepi37;554180]Since there is no rush doing CRUS sequence at home I am concentrated to best output, and 4 workers with 3 core each is not that in this case. But thanks for advice[/QUOTE]
You are most welcome. George is correct. You should not have to use the "Affinity" settings. Closer to the top of [I]local.txt[/I], you [U]might[/U] have something like this: [QUOTE]WorkerThreads=x CoresPerTest=x [/QUOTE]Where [I]x[/I] is a number. I started using [I]Prime95[/I], the Windows equivalent of [I]mprime[/I], in 2005. To date, [I]Prime95[/I] extremely rarely uses more than 50% of of a CPU's capability. I have tried to force it in the past without success. You could set [I]WorkerThreads=12[/I] and [I]CoresPerTest=1[/I], but I doubt it would use this much. Something has to be left for the operating system and other background processes. What I have is an i7. Four physical cores and four logical cores. When I put this together, a long-time member here suggested I use one worker thread and four cores per test, the working being one of the four. This is 50% of the CPU's capacity. I have no "Affinity" settings. You can experiment with those two settings until you find what you feel does the best. If those two settings are not there, then you can add them. They are case-sensitive and must be written as I have them above. |
[QUOTE=storm5510;554194]To date, [I]Prime95[/I] extremely rarely uses more than 50% of of a CPU's capability. .[/QUOTE]
You are very mistaken. Prime95 uses more of a CPU's capability than nearly any other software, period. It is so efficient at using available capacity that using hyperthreads makes the software run slower. Even so, your belief that it's only using 50% of the CPU capability reflects a severe misunderstanding of what logical processors are- a misunderstanding you ought to remedy. To wit: If you have a 4-lane bridge with 8 lanes of highway that merge into the bridge, you claim the bridge is only half used when I close 4 highway lanes and leave 4 lanes open to flow smoothly onto the bridge. Is the bridge half-used? |
3 Attachment(s)
Prime95 & mprime and other GIMPS primality test codes are typically memory bandwidth limited. So say their authors. George has given examples of using MORE instructions to use LESS data memory transfers. TF is less demanding of memory bandwidth. I've found I can run Ernst's Mfactor program on most HT cores alongside, for exponents too large to factor with mfaktx, with modest impact on prime95's primality test throughput, ~15%. [URL]https://en.wikipedia.org/wiki/Hyper-threading[/URL] does not increase memory bandwidth, only certain parts of a core are duplicated. Intel indicated 15-30% performance increase, not 100%, from HT.
I've burned out two motherboards, with an i7-4790 running prime95 and its igp running mfakto at full tilt at the same time. Seems like rather full utilization to me, to be able to take the chip beyond the power rating the board was designed for and presumably would tolerate. The Windows Task Manager display of core utilization, for n core & HT showing us 2n core utilization graphs, can be misleading. Compare the attached Task Manager performance pane screen captures of a 4-core&HT i7-4790 and a dual-6-core-Xeon-x5650 (no HT), each running prime95 at its optimally benchmarked configuration;. The i7-4790 1 worker 4 cores no HT use by prime95 primality testing, 50% cpu utilization indicated for prime95, 13% for gpuowl, 63% total. The x5650s 2 workers 6 cores each chip package & the prime95 process indicates 98-99% cpu utilization. Also Core 2 Duo, no HT, prime95 process is 98% cpu utilization indicated. Prime95 automatically handles the helper thread core affinity for us. |
[QUOTE=storm5510;554194]Something has to be left for the operating system and other background processes.[/QUOTE]Prime95 runs at low priority and is preempted by the OS or assorted user processes. On a HT system, the virtual cores can come into play and lessen the impact. A prime95 worker is preempted as needed. This happens with gpuowl's GCD on a cpu core for example; one prime95 worker yields to it.
|
1 Attachment(s)
With everything above read, explain this (image attached). This is all I can get using multiple configurations. I only use HT on [U]recommended[/U] processes. Others, it seems to have no affect. This is on a P-1, if that makes any differences. So, what is it that I have done wrong all this time.
|
[QUOTE=storm5510;554260] So, what is it that I have done wrong all this time.[/QUOTE]
That looks perfect! It shows each physical core has a prime95 worker on it. |
| All times are UTC. The time now is 20:42. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.