![]() |
Why factoring is single-core designed?
Whenever I run factoring it's limited to "one core job", even if I assign more.
For example, setting up factoring on a Core2 Duo (1 worker/2 threads), it uses both cores but amounting a total of 50% total CPU usage and leaving 50% of idle time. When assigning to an 1055t X6 (1 worker/6 threads), it uses all cores but amounting 16.66% of total CPU usage. Any reason for that? Is it planned to be redesigned to use 100% of all cores? |
[QUOTE=otutusaus;237479]Whenever I run factoring it's limited to "one core job", even if I assign more.
For example, setting up factoring on a Core2 Duo (1 worker/2 threads), it uses both cores but amounting a total of 50% total CPU usage and leaving 50% of idle time. When assigning to an 1055t X6 (1 worker/6 threads), it uses all cores but amounting 16.66% of total CPU usage. Any reason for that? Is it planned to be redesigned to use 100% of all cores?[/QUOTE] Your question assumes facts that are false. |
[QUOTE=R.D. Silverman;237482]Your question assumes facts that are false.[/QUOTE]
Can you point out what is false? |
[QUOTE=otutusaus;237483]Can you point out what is false?[/QUOTE]
Your assumption that less than 100% of the processing capability is being used. |
Do you mean that there is a bottleneck when factoring that doesn't allow the use of more processing capability? My task manager is telling me that CPU usage in my 1055t X6 is 16.66%.
|
[QUOTE=R.D. Silverman;237484]Your assumption that less than 100% of the processing capability is being used.[/QUOTE]
Assuming he's using Prime95's TF, this is an accurate assumption. This isn't a case where someone might think that only 50% of the processing capability is being used due to hyperthreading; it's really only using one core out of two or six. [QUOTE=otutusaus;237479]Whenever I run factoring it's limited to "one core job", even if I assign more. For example, setting up factoring on a Core2 Duo (1 worker/2 threads), it uses both cores but amounting a total of 50% total CPU usage and leaving 50% of idle time. When assigning to an 1055t X6 (1 worker/6 threads), it uses all cores but amounting 16.66% of total CPU usage. Any reason for that? Is it planned to be redesigned to use 100% of all cores?[/QUOTE] First, I'll note that you are correct that the TF of Prime95 is only made to work on a single core, no matter how many threads you assign that worker. I think the reason is simply that both the FFT (LL/P-1) and TF code was originally written to be single-threaded only, and only the FFT code has been updated for multi-threading so far. I don't know of an exact plan to allow it to use more, but I'd imagine it's in the future plans. In the mean time, you can run multiple TF jobs in parallel, or use other software like [URL="http://www.mersenneforum.org/showthread.php?t=7283"]Factor5[/URL] that is multi-thread enabled (but Factor5 is much slower per core than Prime95; usually it's just used for factoring numbers too large for Prime95). |
[QUOTE=Mini-Geek;237488]Assuming he's using Prime95's TF, this is an accurate assumption.[/QUOTE]
Oops, sorry I forgot to mention that. Yes, I am using Prime95. [QUOTE=Mini-Geek;237488]First, I'll note that you are correct that the TF of Prime95 is only made to work on a single core, no matter how many threads you assign that worker.[/QUOTE] You got my point. [QUOTE=Mini-Geek;237488]In the mean time, you can run multiple TF jobs in parallel, or use other software like [URL="http://www.mersenneforum.org/showthread.php?t=7283"]Factor5[/URL] that is multi-thread enabled [/QUOTE] Thanks for the link. Anyhow, any plans for multithreading Prime95 FT? |
[QUOTE=otutusaus;237491]Thanks for the link. Anyhow, any plans for multithreading Prime95 FT?[/QUOTE]
I'd be curious to ask why? On Dual Core and above PCs like you are talking about all but the largest of TF assignments will finish in under a day; the bulk of server assigned TF will complete in hours. As pointed out above you can certainly run a seperate TF on each core. |
Without having tried it, I assume that if you configure Prime95 to run LL tests across multiple cores, and TF is being done as part of the LL assignment, the TF part runs on one core only with the rest idling away. So much for the "last bit of TF after P-1" idea... :big grin:
|
[QUOTE=petrw1;237496]I'd be curious to ask why? On Dual Core and above PCs like you are talking about all but the largest of TF assignments will finish in under a day; the bulk of server assigned TF will complete in hours.[/QUOTE]
I have some PC's working a few hours a day. I think it would be better to be able to send FT results more often (multicore) than having several FT tests running (single core), specially in the case of 6 cores. [QUOTE=petrw1;237496]As pointed out above you can certainly run a seperate TF on each core.[/QUOTE] That's what I have been doing up to now. |
TF does not gain through-put by trying to do it on multiple cores. You will turn in more results per time period by having one core per test.
|
[QUOTE=Uncwilly;237534]TF does not gain through-put by trying to do it on multiple cores. You will turn in more results per time period by having one core per test.[/QUOTE]
I don't see why. One factor can be checked in every core; when a core is done with one factor, it takes the next on the list. I think that can be efficient and it doesn't seem difficult to implement. |
[QUOTE=Uncwilly;237534]TF does not gain through-put by trying to do it on multiple cores. You will turn in more results per time period by having one core per test.[/QUOTE]
[QUOTE=otutusaus;237576]I don't see why. One factor can be checked in every core; when a core is done with one factor, it takes the next on the list. I think that can be efficient and it doesn't seem difficult to implement.[/QUOTE]Let me clarify: Doing one TF test on multiple cores does not achieve better results. Doing one TF per core (all working on separate numbers) gives the best through-put. |
[QUOTE=otutusaus;237576]I don't see why. One factor can be checked in every core; when a core is done with one factor, it takes the next on the list. I think that can be efficient and it doesn't seem difficult to implement.[/QUOTE]
In principle, yes. However, in practice, there is a certain latency associated with interprocess communication between cores. The TF code (as I understand it) does not straightforwardly do one factor candidate, then the next, then the next; it takes advantage of certain algorithmic shortcuts that entail doing the factor candidates within a particular bit level out of order. To split this up requires periodic (on the order of milliseconds) communication between threads to coordinate their effort. (Don't ask me why this is, I don't fully understand the specifics. :smile:) You are correct, though, that TF does naturally lend itself better to multithreading than other worktypes. Similar programs used to search for other (non-Mersenne) types of primes have implemented such multithreading to great effect. However, even the best-optimized multithreaded programs will still have [i]some[/i] performance loss compared to running separate jobs on each core--ideally this is kept down to <1-2% or so, but there is a loss nonetheless. This is why single-exponent multithreaded TF hasn't been a priority at GIMPS to date; as individual TF bit-level assignments take only a few hours, there would be very little benefit at this point to splitting them over multiple cores. |
[QUOTE=mdettweiler;237588] there is a certain latency associated with interprocess communication between cores.[/QUOTE]
[QUOTE=mdettweiler;237588] However, even the best-optimized multithreaded programs will still have [I]some[/I] performance loss compared to running separate jobs on each core--ideally this is kept down to <1-2% or so, but there is a loss nonetheless.[/QUOTE] Whatever associated loss there is, it's already there now. I am not expert, but when I run a FT I can see on the task manager that the job is already shared between cores (amounting a total of not more than a single core job). So the "interprocess communication between cores" is already happening! Overall I don't see why extending the process to all cores should slow the process much more. |
[QUOTE=otutusaus;237576]I don't see why. One factor can be checked in every core; when a core is done with one factor, it takes the next on the list. I think that can be efficient and it doesn't seem difficult to implement.[/QUOTE]
Why would you want to? If you are doing TF on (say) 10 different Mersenne candidates, it is even MORE efficient to devote a single core to each candidate. Ask yourself if you can make a (piece of) string longer by cutting it into pieces and tying the pieces together. |
[QUOTE=mdettweiler;237588]In principle, yes. However, in practice, there is a certain latency associated with interprocess communication between cores. The TF code (as I understand it) does not straightforwardly do one factor candidate, then the next, then the next; it takes advantage of certain algorithmic shortcuts that entail doing the factor candidates within a particular bit level out of order. To split this up requires periodic (on the order of milliseconds) communication between threads to coordinate their effort. (Don't ask me why this is, I don't fully understand the specifics. :smile:)
You are correct, though, that TF does naturally lend itself better to multithreading than other worktypes. Similar programs used to search for other (non-Mersenne) types of primes have implemented such multithreading to great effect. However, even the best-optimized multithreaded programs will still have [i]some[/i] performance loss compared to running separate jobs on each core--ideally this is kept down to <1-2% or so, but there is a loss nonetheless. This is why single-exponent multithreaded TF hasn't been a priority at GIMPS to date; as individual TF bit-level assignments take only a few hours, there would be very little benefit at this point to splitting them over multiple cores.[/QUOTE] Reading common sense is so pleasant! |
[QUOTE=R.D. Silverman;237622]Why would you want to?
If you are doing TF on (say) 10 different Mersenne candidates, it is even MORE efficient to devote a single core to each candidate. Ask yourself if you can make a (piece of) string longer by cutting it into pieces and tying the pieces together.[/QUOTE] Already answered in previous post: [QUOTE=otutusaus;237617]Whatever associated loss there is, it's already there now. I am not expert, but when I run a FT I can see on the task manager that the job is already shared between cores (amounting a total of not more than a single core job). So the "interprocess communication between cores" is already happening! Overall I don't see why extending the process to all cores should slow the process much more.[/QUOTE] |
[QUOTE=otutusaus;237625]Already answered in previous post:[/QUOTE]
It was not answered. You should rename yourself obtuseosaurus. If you want to be argumentative, go somewhere else. Your question was answered by several different people. |
[QUOTE=otutusaus;237617]Whatever associated loss there is, it's already there now. I am not expert, but when I run a FT I can see on the task manager that the job is already shared between cores[/QUOTE]
I'm not sure how to explain the observed behavior, (is it really running on one core and the task manager is somehow wrong? is it a single thread switching between cores? I don't know, but in any case it's still just one thread, and is appropriately fast; if you want to experiment, tell Prime95 to put that worker on a specific core and see what happens to the speed and what appears in the task manager) but just accept the fact that it is slower to run a multi-threaded job than many single-threaded jobs. Why has already been explained quite nicely. |
[QUOTE=Mini-Geek;237629]I'm not sure how to explain the observed behavior, (is it really running on one core and the task manager is somehow wrong? is it a single thread switching between cores? I don't know, but in any case it's still just one thread, and is appropriately fast; if you want to experiment, tell Prime95 to put that worker on a specific core and see what happens to the speed and what appears in the task manager) but just accept the fact that it is slower to run a multi-threaded job than many single-threaded jobs. Why has already been explained quite nicely.[/QUOTE]
It's switching between cores. If you like you can set processor affinity for the thread and see if there's a performance difference; I doubt it. |
[QUOTE=R.D. Silverman;237627]You should rename yourself obtuseosaurus.
If you want to be argumentative, go somewhere else. Your question was answered by several different people.[/QUOTE] Mr. Silverman, I started posting less than a week ago and I am still getting familiar with how Prime95 software works and with the maths behind prime search. I don't intend to be a burden to the forum, but just learn (maths, programming) and suggest ways to improve our overall efforts. I regret having to read posts like yours. Please be more respectful and tolerant with other people's ignorance. |
[QUOTE=Mini-Geek;237629] if you want to experiment, tell Prime95 to put that worker on a specific core and see what happens to the speed and what appears in the task manager) [/QUOTE]
[QUOTE=CRGreathouse;237630]It's switching between cores. If you like you can set processor affinity for the thread and see if there's a performance difference; I doubt it.[/QUOTE] Thanks for your patience and suggestions, I will definitely try that. |
[QUOTE=otutusaus;237634]Mr. Silverman, I started posting less than a week ago and I am still getting familiar....
I regret having to read posts like yours. Please be more respectful and tolerant with other people's ignorance.[/QUOTE]That is Bob. Ignorance is one thing, but if he sees an unwillingness to learn, he gets testy. |
[QUOTE=R.D. Silverman;237627]
If you want to be argumentative, go somewhere else.[/QUOTE] On the contrary: if he's looking for an argument, he's come to the right place. Or is this abuse? David |
[QUOTE=Uncwilly;237717]That is Bob. Ignorance is one thing, but if he sees an unwillingness to learn, he gets testy.[/QUOTE]
The issue is not ignorance. He asked a reasonable question. But after he was given a response from several different posters he still continued to argue about it. |
I understood the fact that it is more efficient to run FT in a single core than to divide the job between cores. All answers were clear about that. I wasn't arguing about that anymore. This is settled. Single core is best. Thank you for your explanations.
What I was stating later is that it looks as if the FT in Prime95 is running shared between cores (according to what I see the Windows Task Manager). Anyway, I've run FT tests using different settings (Smart assignment in 1 thread/2 cores, Run on any CPU in 1 thread/2 cores, Run on any CPU in 1 thread per core, or as CPU #1 in 1 thread) and all timings are the same. |
[QUOTE=R.D. Silverman;237798]The issue is not ignorance. He asked a reasonable question.
But after he was given a response from several different posters he still continued to argue about it.[/QUOTE] That does not go against what I said. |
[QUOTE=otutusaus;237801]I understood the fact that it is more efficient to run FT in a single core than to divide the job between cores. All answers were clear about that. I wasn't arguing about that anymore. This is settled. Single core is best. Thank you for your explanations.
What I was stating later is that it looks as if the FT in Prime95 is running shared between cores (according to what I see the Windows Task Manager).[/QUOTE] It may look a bit like it in Task Manager, because Task Manager would be doing some time-averaging. What you are seeing is prime95 being switched between cores (as CRGreathouse said earlier). This means it runs on one core for a time, then is moved to another, ..., at the whim of the OS. From our (human) perspective this can happen very quickly. This is [B]not[/B] the same thing as multithreading, where an application runs on two or more cores at the exact same time. [SIZE="1"]Disclaimer: Regarding computers, I am an untrained lay-person. (IANAL equivalent.) Doubtless there are exceptions & more precise definitions than my essay.[/SIZE] |
[QUOTE=R.D. Silverman;237798]The issue is not ignorance. He asked a reasonable question.
But after he was given a response from several different posters he still continued to argue about it.[/QUOTE] God you are such a reasonable man. David PS If you need any more ammo, try my latest post in the music thread. |
[QUOTE=otutusaus;237801]I understood the fact that it is more efficient to run FT in a single core than to divide the job between cores. All answers were clear about that. I wasn't arguing about that anymore. This is settled. Single core is best. Thank you for your explanations.
What I was stating later is that it looks as if the FT in Prime95 is running shared between cores (according to what I see the Windows Task Manager). Anyway, I've run FT tests using different settings (Smart assignment in 1 thread/2 cores, Run on any CPU in 1 thread/2 cores, Run on any CPU in 1 thread per core, or as CPU #1 in 1 thread) and all timings are the same.[/QUOTE] You may try [URL="http://www.dewassoc.com/support/useful/wintop.htm"]Wintop[/URL] to check timings on different threads. Luigi |
[QUOTE=markr;237841]This means it runs on one core for a time, then is moved to another, ..., at the whim of the OS.[/QUOTE]
This is correct, and is true on most* OS and most* hardware. The reason is that it makes for more even heat generation/dissipation within the CPU => cooler => more reliable => longer lifetime. * most = all, in my personal experience, unless affinity has be used to override this. |
[QUOTE=Vato;237933]The reason is that it makes for more even heat generation/dissipation within the CPU => cooler => more reliable => longer lifetime.[/QUOTE]Thanks for pointing this out. It's something I'd hardly ever consider.
[QUOTE=Vato;237933]* most = all, in my personal experience, unless affinity has be used to override this.[/QUOTE][I]Does that mean that using affinity to assign particular GIMPS tasks to particular cores could, theoretically, reduce CPU reliability and/or lifetime[/I] (compared to the case where the OS periodically rotated tasks among cores) [I]if some of those tasks were more CPU-intensive than others?[/I] |
[QUOTE=Vato;237933]This is correct, and is true on most* OS and most* hardware.
The reason is that it makes for more even heat generation/dissipation within the CPU => cooler => more reliable => longer lifetime. * most = all, in my personal experience, unless affinity has be used to override this.[/QUOTE] I'll need some evidence that this behavior is by design. It might well be, but I am skeptical because that behavior could have a substantial negative impact on performance due to increased cache misses and loss of locality in cached data (i.e. Core 2 Quads sharing L2 cache over FSB). |
| All times are UTC. The time now is 19:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.