mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 slowed down after I restarted computer (https://www.mersenneforum.org/showthread.php?t=14327)

ixfd64 2010-12-10 20:29

Prime95 slowed down after I restarted computer
 
As I mentioned in another thread, I recently got access to a laptop with an i7-720QM processor. I installed Prime95 and configured it to do P-1 factoring on four worker windows, one for each physical core. I set the number of iterations between screen outputs to 500, with each output taking about 65 seconds.

However, it went up to about 115 seconds after I restarted the computer. At one point, it even slowed down to about ten minutes! Restarting the computer again seemed to solve the second problem, although the time between outputs is still about 115 seconds.

I am using Windows 7, and I never changed the priority of Prime95 (either through the task manager or Prime95 itself). Interestingly, the CPU usage was around 50% the whole time. Does anyone know what might be happening?

When I first ran Prime95, I let it use eight worker threads (one for each thread/logical core) by default. However, I later reduced it to four following Mini-Geek's suggestion. Could this have something to do with the slowdown?

edit: local.txt has the following line:

[code]ThreadsPerTest=1[/code]

Should I change it?

Prime95 2010-12-10 21:45

[QUOTE=ixfd64;241144]However, it went up to about 115 seconds after I restarted the computer.[/QUOTE]

I've seen this kind of thing before. It seems that there is a problem in figuring out which logical processors are real processors and which are the hyperthreaded ones. This results in two workers getting assigned to the same core.

The easiest workaround may be to change the affinity settings letting each worker run on any CPU. Hopefully, the OS will do a better job distributing the workload among the CPUs than Prime95 is doing.

ixfd64 2010-12-10 21:56

I changed the CPU affinity from "Smart assignment" to "Run on any CPU," and the average time between outputs is now even longer!

Would it help to disable hyperthreading?

garo 2010-12-10 21:59

Yes I bet disabling hyper-threading will help!

ixfd64 2010-12-11 00:50

OK, I went to BIOS and turned off hyperthreading.

For a short time, Prime95 was faster than when I first installed it. However, the time between outputs is now several hundred seconds! Not to mention, my other programs are now lagging. What now? :help:

Uncwilly 2010-12-11 00:57

[QUOTE=ixfd64;241186]OK, I went to BIOS and turned off hyperthreading.

For a short time, Prime95 was faster than when I first installed it. However, the time between outputs is now several hundred seconds! Not to mention, my other programs are now lagging. What now? :help:[/QUOTE]
Is it thermal throttling?

Mini-Geek 2010-12-11 00:58

Maybe you should try re-enabling hyper-threading, run 4 workers on 1 thread each, and set the affinities using this option:[CODE]You can arbitrarily change how the program assigns affinity to CPUs.
The program makes its best guess at assigning workers and helper threads
to CPUs for optimal speed. However, new architectures or situations we
haven't considered may make different affinity setting desirable. In
local.txt set
AffinityScramble=string
Where the string "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz()"
is the 64-core "make no changes" string. For example, let's say you have a system
with 8 logical cores with 4 workers each using a helper thread. The program would
ordinarily assign the worker and helper threads to [0,1], [2,3], [4,5], [6,7].
However, if you think [0,2], [1,3], [4,6], [5,7] would give better performance,
you would set AffinityScramble=02134657 to test out your theory.
[/CODE]I think this would be what you want:
[CODE]AffinityScramble=0246[/CODE]
[QUOTE=Uncwilly;241188]Is it thermal throttling?[/QUOTE]
Good point. Another "obvious" thing we haven't asked (and maybe you haven't fully checked out): is anything else using large amounts of CPU, RAM, or hard drive access that might be interfering? Do you see similar slowdowns if you boot into safe mode?
[QUOTE=ixfd64;241144]I am using Windows 7, and I never changed the priority of Prime95 (either through the task manager or Prime95 itself). Interestingly, the CPU usage was around 50% the whole time. Does anyone know what might be happening?[/QUOTE]
Windows thinks of the HT cores the same as the physical ones (for that purpose anyway), so when it sees that they aren't being used, it thinks the CPU usage is 50% when it's really at max capacity for Prime95. Showing 50% usage is normal for HT-enabled machines.

ixfd64 2010-12-11 02:42

OK, I re-enabled hyperthreading and added "AffinityScramble=0246" to local.txt. The time between screen outputs is now close to when I first ran Prime95. I still get spikes of over 100 seconds from time to time, but I guess it's better than the earlier average of 115!

By the way, is it a bad idea to run P-1 on all four worker windows?

Mini-Geek 2010-12-11 03:22

[QUOTE=ixfd64;241212]OK, I re-enabled hyperthreading and added "AffinityScramble=0246" to local.txt. The time between screen outputs is now close to when I first ran Prime95. I still get spikes of over 100 seconds from time to time, but I guess it's better than the earlier average of 115![/QUOTE]
Sounds like the suggestion helped. :tu: Not perfect, but better...
[QUOTE=ixfd64;241212]By the way, is it a bad idea to run P-1 on all four worker windows?[/QUOTE]
Hmm...in stage 2 you might get a big slow-down due to memory bandwidth. But then LL needs fast bandwidth too, just not as much storage, so it might be a non-issue. Plus, there's the fact that if you have X MB of RAM for Prime95, each worker can only get between X and X/4 MB of RAM, (closer to X/4 than X, usually) so the P-1 tests aren't as efficient as they could be. If you want to do it, and you first test it out to see how it works out, (like you're doing now - I just mean don't leave it on that 24/7 without seeing whether it works well or not) there's nothing wrong with doing P-1 on all four workers, but it's simplest to just put one worker on P-1 and the rest on other tasks.

mdettweiler 2010-12-11 03:40

[QUOTE=Mini-Geek;241224]Hmm...in stage 2 you might get a big slow-down due to memory bandwidth. But then LL needs fast bandwidth too, just not as much storage, so it might be a non-issue. Plus, there's the fact that if you have X MB of RAM for Prime95, each worker can only get between X and X/4 MB of RAM, (closer to X/4 than X, usually) so the P-1 tests aren't as efficient as they could be. If you want to do it, and you first test it out to see how it works out, (like you're doing now - I just mean don't leave it on that 24/7 without seeing whether it works well or not) there's nothing wrong with doing P-1 on all four workers, but it's simplest to just put one worker on P-1 and the rest on other tasks.[/QUOTE]
Note that you can also set MaxHighMemWorkers=1 in prime.txt to ensure that only one worker runs stage 2 at any given time. (If two workers need to run stage 2, the second one will put its stage 2 on hold and move on to stage 1 of the next assignment until the first worker's stage 2 finishes.) With this, you can "safely" run all four cores on P-1.

Mini-Geek 2010-12-11 03:43

[QUOTE=mdettweiler;241227]Note that you can also set MaxHighMemWorkers=1 in prime.txt to ensure that only one worker runs stage 2 at any given time. (If two workers need to run stage 2, the second one will put its stage 2 on hold and move on to stage 1 of the next assignment until the first worker's stage 2 finishes.) With this, you can "safely" run all four cores on P-1.[/QUOTE]

MaxHighMemWorkers=2 would be better, because otherwise you'd always have 3 cores running stage 1 and 1 core running stage 2. Since they take roughly (in a test of [URL="http://mersenne-aries.sili.net/credit.php?worktype=P-1&exponent=33983237&f_exponent=&b1=385000&b2=7988750&numcurves=&factor=&frombits=&tobits=&submitbutton=Calculate"]CPU[/URL] [URL="http://mersenne-aries.sili.net/credit.php?worktype=P-1&exponent=33983237&f_exponent=&b1=385000&b2=385000&numcurves=&factor=&frombits=&tobits=&submitbutton=Calculate"]credit[/URL] just now, the whole thing should take about 2.076 times longer than stage 1, so that's a pretty close "roughly") the same time, stage 2 would continually fall behind stage 1. Obviously, that is not a good situation.
Note that "Default [setting for MaxHighMemWorkers] is available memory / 200MB."
Also remember that Memory can be worker-specific: (particularly useful for a situation where you want each worker to use up to, say, 250 MB, regardless of what the other workers are doing; this didn't work correctly in the past, but newer versions have it fixed)[code]The Memory=n setting in local.txt refers to the total amount of memory the
program can use. You can also put this in the [Worker #n] section to place
a maximum amount of memory that one particular worker can use.
[/code]

ixfd64 2010-12-12 05:48

Looking back at this thread, I have a suggestion: it would be useful if Prime95 had a feature that automatically detects (or tries to) the optimal CPU affinity setting.

ixfd64 2010-12-15 19:09

Hmm, Prime95 still slows down from time to time, sometimes down to 500 seconds between screen outputs for P-1 (like when I disabled hyper-threading); the only difference is that it doesn't happen as often as it originally did.

Does anyone know if version 26.x is better at handling hyper-threaded CPUs?

ixfd64 2010-12-18 06:56

On second thought, I don't think the "slowdown" is related to the hyper-threading at all. According to the official specs, the i7-720QM normally operates at 1.6 GHz, with a 2.8 GHz Turbo Boost. For the record, 2.8 / 1.6 = 1.75.

As I mentioned earlier, the time per 500 P-1 iterations was fluctuating between 65 and 115 seconds. 115 / 65 = 1.77, which is very similar to the ratio for the CPU clock speeds. So the "issue" was most likely the result of the CPU operating at its normal speed. That having been said, does anyone know of any good ways to maximize the use of Turbo Boost?

By the way, Prime95 will still occasionally slow down by an order of magnitude, but restarting the computer usually solves this problem.


All times are UTC. The time now is 07:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.