![]() |
[QUOTE=Prime95;454783]29.1 assigns affinity to both hyperthreads on a core. That is, if prime95 decides to assign to core #0, it assigns it to logical cores #0 and #32. My thought is that assigning to either logical CPU should result in the same performance.
There is a way to override prime95's affinity setting algorithm in local.txt -- much better than the ugly AffinityScamble settings. You can be the first to test it! Create two sections in local.txt: [Worker #1] and [#Worker #2]. Then use this syntax copied from comments in the code: [CODE]/* Parse affinity settings specified in the INI file. */ /* We accept several syntaxes in an INI file: */ /* 3,6,9 Run main worker thread on logical CPU #3, run two aux threads on logical CPUs #6 & #9 */ /* 3-4,5-6 Run main worker thread on logical CPUs #3 & #4, run aux thread on logical CPUs #5 & #6 */ /* {3,5,7},{4,6} Run main worker thread on logical CPUs #3, #5, & #7, run aux thread on logical CPUs #4 & #6 */ /* (3,5,7),(4,6) Run main worker thread on logical CPUs #3, #5, & #7, run aux thread on logical CPUs #4 & #6 */ /* [3,5-7],(4,6) Run main worker thread on logical CPUs #3, #5, #6, & #7, run aux thread on logical CPUs #4 & #6 */ [/CODE][/QUOTE]Instead of the brackets and commas you could use +[code]/* 3+5-7,4+6 Run main worker thread on logical CPUs #3, #5, #6, & #7, run aux thread on logical CPUs #4 & #6 */[/code] |
[QUOTE=retina;454784]Instead of the brackets and commas you could use +[code]/* 3+5-7,4+6 Run main worker thread on logical CPUs #3, #5, #6, & #7, run aux thread on logical CPUs #4 & #6 */[/code][/QUOTE]
I would argue that's too clever for its own good. Rather less readable and rather more unintuitive. |
[QUOTE=Dubslow;454785]I would argue that's too clever for its own good. Rather less readable and rather more unintuitive.[/QUOTE]Perhaps.
But with proper use of spacing, and allow one character that is ignored:[code]3+5-->9, 4+6 // the ">" is ignored and more than one "-" is allowed also.[/code] |
[QUOTE]There is a way to override prime95's affinity setting algorithm in local.txt[/QUOTE]
In what versions do these additions have effect? Also, I am unclear as to what should actually be on a line below [Worker #1]. If I wanted my i7 quad to run Prime95 on cores 1, 2, 5, 7 (single worker, 4 threads), how should I state this? (The numbers given assumes there is a core 0.) Finally, is the first # in [[B][COLOR=Red]#[/COLOR][/B]Worker #2] supposed to be there? I have really wanted a way to definitively assign cores in P95. When the work of a single thread is split between paired hyper threads, there is a definite performance hit. |
My in-dev version of Mlucas has a not-dissimilar explicit-affinity option, of form -cpu [], where [] stands in place of a set of core IDs, e.g. 3,4,6,7, or specified via lo:hi:stride triplet, e.g. 0:31:2 means runs on the 16 even-indexed cores < 32.
I don't have a separate auxiliary-task needing such specification, but if I did I would probably simply add a second -aux flag with the same numerical syntax. |
A quick post before I forget again...
In testing, I ran my 1700 at stock (3.2 GHz all core loaded) with the latest P95 test version using FMA. I benchmarked no HT at 64k FFT. Overclocked to 3.6 GHz 1.20v, benchmarked same. For 12.5% increase in clock speed, I was seeing only about 8% improvement in score (I don't have the values on me). This was run multiple times in case of a bad run. I didn't use HT as that gave lower scores when on. On Intel systems, I'd expect to see near perfect clock scaling. The tasks should be small enough not to be affected by ram speed. Has anyone else experienced similar? This isn't necessarily limited to P95, as I saw similar in Cinebench R15 scores. Past testing only shows a very weak dependency on ram bandwidth on Intel systems. |
[QUOTE=mackerel;454794]A quick post before I forget again...
In testing, I ran my 1700 at stock (3.2 GHz all core loaded) with the latest P95 test version using FMA. I benchmarked no HT at 64k FFT. Overclocked to 3.6 GHz 1.20v, benchmarked same. For 12.5% increase in clock speed, I was seeing only about 8% improvement in score (I don't have the values on me). This was run multiple times in case of a bad run. I didn't use HT as that gave lower scores when on. On Intel systems, I'd expect to see near perfect clock scaling. The tasks should be small enough not to be affected by ram speed. Has anyone else experienced similar? This isn't necessarily limited to P95, as I saw similar in Cinebench R15 scores. Past testing only shows a very weak dependency on ram bandwidth on Intel systems.[/QUOTE] If I understand correctly, a 64k FFT will need about 512 KB of memory. So you'll be bumping out of L2 occasionally. Does the same happen if you try a 32k FFT? |
I can try later, but regardless if it sits in L2 or not, it will fit in L3 like with Intel. I assume the L3 is fast enough still...
|
[QUOTE=mackerel;454794]A quick post before I forget again...
In testing, I ran my 1700 at stock (3.2 GHz all core loaded) with the latest P95 test version using FMA. I benchmarked no HT at 64k FFT. Overclocked to 3.6 GHz 1.20v, benchmarked same. For 12.5% increase in clock speed, I was seeing only about 8% improvement in score (I don't have the values on me). This was run multiple times in case of a bad run. I didn't use HT as that gave lower scores when on. On Intel systems, I'd expect to see near perfect clock scaling. The tasks should be small enough not to be affected by ram speed. Has anyone else experienced similar? This isn't necessarily limited to P95, as I saw similar in Cinebench R15 scores. Past testing only shows a very weak dependency on ram bandwidth on Intel systems.[/QUOTE] I have noticed it in my own code. But those don't fit in the cache. I don't know if Zen has a separate clock for the caches (the fabric) like Intel does. Perhaps that's playing into effect? |
[QUOTE=kladner;454789]In what versions do these additions have effect? Also, I am unclear as to what should actually be on a line below [Worker #1].
If I wanted my i7 quad to run Prime95 on cores 1, 2, 5, 7 (single worker, 4 threads), how should I state this? (The numbers given assumes there is a core 0.)[/quote] This feature is new in 29.1. Yes, I left out some vital information. You would enter this: [Worker #1] Affinity=1,2,5,7 [quote]Finally, is the first # in [[B][COLOR=Red]#[/COLOR][/B]Worker #2] supposed to be there?[/quote] Yes, the first # was a typo. [quote]I have really wanted a way to definitively assign cores in P95. When the work of a single thread is split between paired hyper threads, there is a definite performance hit.[/QUOTE] Assuming logical CPUs 0 and 1 are on the same physical core, are you saying that a program will get better performance setting a thread's affinity to logical CPU 0 rather than logical CPUs 0 and 1? |
[QUOTE=Prime95;454808]This feature is new in 29.1. Yes, I left out some vital information. You would enter this:
[Worker #1] Affinity=1,2,5,7 Yes, the first # was a typo. Assuming logical CPUs 0 and 1 are on the same physical core, are you saying that a program will get better performance setting a thread's affinity to logical CPU 0 rather than logical CPUs 0 and 1?[/QUOTE] Thanks for the clarification on syntax. Where can I get the current version of 29.x? I was only avoiding CPU 0 because of the possible interrupt issues. As I write this, I realize that 0 and 1 are essentially the same. My only real concern is getting affinity locked to 0 or 1, 2 or 3, 4 or 5, and 6 or 7. I can use Task Manager to set affinity to the cores which appear to have been selected by Prrime95. If my selections are correct, there is about a 2% increase in P95 DC performance. However, this correlation between P95's selections and mine may not persist. Sometime P95 stops using a Task Manager-defined core, without moving to the associated hyper thread. I have to reset affinity for "all cores" to recover. |
| All times are UTC. The time now is 19:57. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.