![]() |
Hyperthreading
My experience with hyperthreading is that it is an excellent way to generate more heat without accomplishing much of anything, if the physical core is already maxed out with the first thread.
I have a first gen i3 which has two physical cores and four logical cores. Lately, I've been wondering if there was any advantage to running all four workers. I am running two P-1 workers as we speak, but I was wondering if I could run two LL-tests instead and then give each core some other thing to do while it waits for data from the memory. The box is a dell which won't allow me to run anything other than 2x4GB @ 1066MHz with 8-8-8-24 timings, so I would imagine LL-tests would quickly get bottlenecked. Short question: Is there a way, that you know of, for me to take advantage of hyperthreading? |
Some users report that sieving for NFS has noticeable benefit from hyperthreading.
|
Would I receive any benefit in hyperthreading LL tests? I have a quad core processor running four LL tests now but Task Manager indicates only 51- 55 percent CPU usage.
|
[QUOTE=Batalov;343459]Some users report that sieving for NFS has noticeable benefit from hyperthreading.[/QUOTE]
Indeed, on my four-core Haswell I get something like 16000 seconds per block when running 8 threads, 11000 seconds per block when running four - nearly 50% more blocks per day when running HT. But NFS sieving involves quite a lot of waiting for caches, whilst LL is very carefully written to ensure that the floating-point unit is never waiting, and so running two hyperthreads doing LL isn't going to get you anything. |
As fivemack mentioned, the LL code is very highly optimised with 100% utilisation of the FPU. So there's no further efficiency that HT can extract.
In fact, I find my CPU temperatures a bit higher with HT enabled. So if you exclusively run LL and LL Double Checks, you might even want to disable it. |
[QUOTE=db597;348495]As fivemack mentioned, the LL code is very highly optimised with 100% utilisation of the FPU. So there's no further efficiency that HT can extract.
In fact, I find my CPU temperatures a bit higher with HT enabled. So if you exclusively run LL and LL Double Checks, you might even want to disable it.[/QUOTE] Not enabled. I like the current extra processor power so that even when I am doing other things on the computer tasks are not slow (and P95 seems indifferent except in 1ms increase in iteration time on one or two cores). Temps are fine now at 51-56 degrees Celsius (usually 52 to 53) with 24/7 running. Plus, it is rather warm in my apartment (too expensive to AC much it it so hot outside). |
[QUOTE=Primeinator;348510]Not enabled. I like the current extra processor power so that even when I am doing other things on the computer tasks are not slow (and P95 seems indifferent except in 1ms increase in iteration time on one or two cores). Temps are fine now at 51-56 degrees Celsius (usually 52 to 53) with 24/7 running. Plus, it is rather warm in my apartment (too expensive to AC much it it so hot outside).[/QUOTE]
This is valid. To clarify: Running more LL tests (Prime95 or LLR) than physical cores adds heat but not performance. Running math programs that wait on cache or memory access for large parts of their runtime will get you "free" work. For instance, my laptop i7 gives up about 15% iteration time running 4 LLR instances when I run 4 NFS threads, while the NFS threads run over half as fast as when the CPU is otherwise idle. Lose 15, gain 55. I'll take it. ECM and P-1 both qualify for this- the same items that slow-memory users can run to help the project on core #4 can be tried on HT 'cores' to see if overall production increases. There are also quite a few BOINCified projects in other parts of this forum that would appreciate one HT core while you run LL. Note that Windows is 'tricked' by HT- reports of 50% CPU use are simply false. The 51-55% readings suggest your regular use is 1-5%, and being fed to the HT 'cores' while LL keeps running. |
I don't really appreciate Windows' account of CPU usage. I wouldn't mind a more comprehensive analysis, i.e. some kind of advanced mode. I know the sort of casual computer user gets a kick out of trying to get the CPU usage widget to go above 20% on their brand new box, but very few people really have any business needing to know what their CPU usage is. Those who do probably realize that it is so wrong some times.
To create my worktodo.txt for 50,000 large exponents to 66 bits I was doing a bit of copying and pasting, and yes 50,000 lines is a lot, but Word 2010 (find and replace function is so nice) was using "25%" for a minute or so. Baloney. The CPU must have been spending 99.999% of its time waiting on something else. Probably the memory, since Prime95 has it maxed out already. Accessing the "paste" data must have been taking a while if it had to compete with 4 Prime95 workers on a fast CPU, but the task manager decided that the one core was completely busy with Word. A memory bandwidth usage indicator could be interesting and useful. |
Prime95 does run in low priority, and word may well be single threaded, and as such unable to run on more than 1 core.
So yeah, it's possible it did cap a core. :) |
I know it runs on lower priority, but does that include memory access?
|
[QUOTE=TheMawn;349338]I don't really appreciate Windows' account of CPU usage. I wouldn't mind a more comprehensive analysis, i.e. some kind of advanced mode. I know the sort of casual computer user gets a kick out of trying to get the CPU usage widget to go above 20% on their brand new box, but very few people really have any business needing to know what their CPU usage is. Those who do probably realize that it is so wrong some times.
To create my worktodo.txt for 50,000 large exponents to 66 bits I was doing a bit of copying and pasting, and yes 50,000 lines is a lot, but Word 2010 (find and replace function is so nice) was using "25%" for a minute or so. Baloney. The CPU must have been spending 99.999% of its time waiting on something else. Probably the memory, since Prime95 has it maxed out already. Accessing the "paste" data must have been taking a while if it had to compete with 4 Prime95 workers on a fast CPU, but the task manager decided that the one core was completely busy with Word. A memory bandwidth usage indicator could be interesting and useful.[/QUOTE] SysInternals [URL="http://technet.microsoft.com/en-us/sysinternals/bb896653"]Process Explorer[/URL] gives CPU usage for just about everything, to 2 decimal places. It gives memory usage, but not bandwidth usages AFAIK. SysInternals might be just the place to look for such a thing, though. EDIT: Searching [URL="http://technet.microsoft.com/en-us/sysinternals/bb545027"]this Index[/URL] might turn up something on the memory front. |
Well, if you were memory bound on the chip and prime95 wouldn't give up the bus space, then I expect the cpu usage would drop. The exact particulars of how windows decides what time to give what thread are not well known, but it is unlikely that the search and replace would use much memory - it replaces items as it goes. Unless you mean system ram, which prime95 would not give up, then yeah, all signs point to yes - cpu memory would be allocated to the thread with current priority on the related cpu. Ram shouldn't come into play for a s/r - the OS itself would be stuttering if you couldn't free up that much memory on demand.
|
What I meant was if Prime95 was using up all the memory [I]bandwidth[/I], not the memory itself.
If my RAM can provide 20 GB/s of bandwidth and Prime95 wants 25 GB/s and Word wants 10 MB/s, does Word have to fight tooth and nail for it, or does Prime95 give up the [I]bandwidth[/I] like it does CPU cycles? |
| All times are UTC. The time now is 20:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.