![]() |
Prime95 24.14 and hyperthreading
I know this has been asked many times, but:
Some time ago, I noticed that my task manager was displaying two processors even though I only have one. From what I hear, this is due to hyperthreading. While it appears that Prime95 is using only 50% of the CPU resources, it is actually using 100%. Does this mean that it is not possible to run another instance of Prime95? |
You can, but I don't think you will get any more throughput from it.
|
[QUOTE=ixfd64;145423]I know this has been asked many times, but:
Some time ago, I noticed that my task manager was displaying two processors even though I only have one. From what I hear, this is due to hyperthreading. While it appears that Prime95 is using only 50% of the CPU resources, it is actually using 100%. Does this mean that it is not possible to run another instance of Prime95?[/QUOTE] You could run another instance of Prime95, but what hyperthreading does is to provide a 'second processor' which uses the resources not being used by the program running on the main processor. Since Prime95 is exceptionally well-written and uses all the resources available in about as efficient a manner as possible, there's almost nothing left for the 'second processor'. |
[QUOTE=fivemack;145437]You could run another instance of Prime95 but... ..there's almost nothing left for the 'second processor'[/QUOTE]
I run two instances of Prime95 on a hyperthreaded P4. I reckon i get about 5-10% more throughput than a single instance. But its slightly more work to make sure the two threads don't argue with each other. Richard |
You can switch to 25.7 and use 2 threads on the same number.
|
I've heard that running two instances on a hyperthreaded machine will, as was stated once or twice already in this thread, squeeze another 5-10% of throughput out of your machine, though usually at a slight expense in CPU temperature. I personally would probably run two instances most of the time, though back when I had a hyperthreaded P4 (since upgraded to a Core 2 Duo) I would sometimes just run one instance for convenience' sake. :smile:
|
On my Northwood P4 I have found that for LL testing with large FFT sizes there is less than 1% throughput to be gained, but for very small FFT sizes with ECM there can be more than 25% gain from running two hyperthreaded processes.
Best to try it out on your own machine and make the comparison for yourself. |
[quote=geoff;145507]On my Northwood P4 I have found that for LL testing with large FFT sizes there is less than 1% throughput to be gained, but for very small FFT sizes with ECM there can be more than 25% gain from running two hyperthreaded processes.
Best to try it out on your own machine and make the comparison for yourself.[/quote] Hmm...interesting. It would seem that there is less of a difference between one thread and two threads for large FFT's, than there is for small FFT's. This would be consistent with my personal experience, as the vast majority of the FPU-intensive distributed computing work that I have done has involved small FFT's (such as LLR, PRP, etc., all of which primarily use small FFT's at their current overall testing levels). Anyone else notice a similar pattern of diminishing returns from hyperthreading as FFT sizes get bigger? |
[quote=mdettweiler;145514]Anyone else notice a similar pattern of diminishing returns from hyperthreading as FFT sizes get bigger?[/quote]If so, that would be in accordance with the increasing percentage of overall time that the software spends in the main FP compute loops for larger FFTs. A pipeline-full of floating-point instructions for one hyperthread would allow non-FP overhead instructions to be done expeditiously in the second hyperthread, for as long as there were enough non-FP instructions to be executed there before the second hyperthread loaded up the FP pipeline from its side or the first one's FP-pipe finished, that is.
|
[quote=cheesehead;145572]If so, that would be in accordance with the increasing percentage of overall time that the software spends in the main FP compute loops for larger FFTs. A pipeline-full of floating-point instructions for one hyperthread would allow non-FP overhead instructions to be done expeditiously in the second hyperthread, for as long as there were enough non-FP instructions to be executed there before the second hyperthread loaded up the FP pipeline from its side or the first one's FP-pipe finished, that is.[/quote]
So...with this in mind, I wonder how one thread vs. two threads would compare for low-FPU-usage tasks such as trial factoring or sieving? Has anyone ever compared these? |
[QUOTE=cheesehead;145572]If so, that would be in accordance with the increasing percentage of overall time that the software spends in the main FP compute loops for larger FFTs. A pipeline-full of floating-point instructions for one hyperthread would allow non-FP overhead instructions to be done expeditiously in the second hyperthread, for as long as there were enough non-FP instructions to be executed there before the second hyperthread loaded up the FP pipeline from its side or the first one's FP-pipe finished, that is.[/QUOTE]
Hyperthreading can sometimes work well even when both processes are executing the same type of instructions. For example if two 32-bit processes are executing instructions that operate on the lower 32 bits of an %xmm register, then I think hyperthreading can combine both instructions and execute them together as one 2x32-bit vector operation. |
[quote=geoff;145624]Hyperthreading can sometimes work well even when both processes are executing the same type of instructions. For example if two 32-bit processes are executing instructions that operate on the lower 32 bits of an %xmm register, then I think hyperthreading can combine both instructions and execute them together as one 2x32-bit vector operation.[/quote]
So, two similar processes of a sieve/TF program running simulatenously, or two LL/LLR/PRP/etc. processes running on similarly-sized numbers, would gain a greater advantage than if you were running, say, one sieve/TF process and one LL/LLR/PRP process? |
[QUOTE=mdettweiler;145626]So, two similar processes of a sieve/TF program running simulatenously, or two LL/LLR/PRP/etc. processes running on similarly-sized numbers, would gain a greater advantage than if you were running, say, one sieve/TF process and one LL/LLR/PRP process?[/QUOTE]
Not necessarily, it depends on how the programs are written, but just because two programs are making heavy use of the same type of instructions doesn't always mean that nothing will be gained by hyperthreading. Some programs work best when running two copies of the same program, others work better when run together with a completely different program. Yet others are not suitable for hyperthreading at all, usually because they are so tightly coded that there are no spare resources to share with another process. I don't understand enough about hyperthreading to predict how two programs will work together, I just recommend experimenting for yourself. |
[quote=geoff;145630]Not necessarily, it depends on how the programs are written, but just because two programs are making heavy use of the same type of instructions doesn't always mean that nothing will be gained by hyperthreading. Some programs work best when running two copies of the same program, others work better when run together with a completely different program. Yet others are not suitable for hyperthreading at all, usually because they are so tightly coded that there are no spare resources to share with another process.
I don't understand enough about hyperthreading to predict how two programs will work together, I just recommend experimenting for yourself.[/quote] Oh, I see--apparently I made a little blooper when reading your earlier post and thought you said that hyperthreading generally performs [i]best[/i] when run with two similar applications. :blush: Now I get it. :smile: Unfortunately I don't have a hyperthreaded CPU any more (though in a way that can also be seen as fortunately so, since my hyperthreaded P4 was replaced with a much faster Core 2 Duo when it got fried by mysterious unknown forces), so I can't test any of this out for myself--I'm just speaking entirely based on experience from my old CPU, which admittedly isn't too much (since for most of that CPU's tenure I was actually somewhat mistaken about how hyperthreading actually worked). :smile: |
question: is it more efficient to run one copy of prime95 on both cores of a dual core machine for 2 hours a day, or one copy on each core, 1 hour a day each. (i ask this because factoring apparently doesn't produce a "helper thread", unlike primality test. speaking of which, how does the helper thread work while running primality tests?)
|
| All times are UTC. The time now is 14:53. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.