mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Worker iteration speed slow down (https://www.mersenneforum.org/showthread.php?t=22633)

bplenhart 2017-10-10 03:08

Worker iteration speed slow down
 
2 Attachment(s)
I am seeing workers significantly slow down when completing an exponent and starting a new exponent. Worker is running almost double ms/iter on the new exponent. I believe the FFT size is the same for both exponents. Stopping the worker and restarting has no impact. Stopping all workers and exiting the program, then restarting allows all workers to run at 'normal' ms/iter. Dfference is 30ms/iter increased to 48ms/iter for same size FFT.
This is happening on 8 different machines.

I have screen shots of 2 machines attached.

I apologize if this behavior is documented in the forum else where, I was unable to find any information.

bplenhart

Mark Rose 2017-10-10 06:18

Both of your previous exponents barely fit into a 4096k FFT with an acceptable round-off error. Your new ones require a larger FFT, and Prime95 chose a 4480k FFT.

bplenhart 2017-10-10 11:16

Ok, the FFT size may be the next size larger. I would expect a modest increase in ms/iter. Not almost double.

Also, exiting the Prime95 program and restarting clears this behavior -- meaning I get the expected ms/iter throughput for that worker.

bplenhart 2017-10-10 11:49

2 Attachment(s)
Data

Machine 1:
LL exponent M77896541, [B]29.4[/B] ms/iter --> complete LL & start next exponent --> LL exponent M78591281, [B]47.7[/B] ms/iter --> stop & exit Prime95 --> restart Prime95 --> LL exponent M78591281, [B]31.5[/B] ms/iter.

Machine 2:
LL exponent M77842553, [B]30.7[/B] ms/iter --> complete LL & start next exponent --> LL exponent M78564151, [B]46.7[/B] ms/iter --> stop & exit Prime95 --> restart Prime95 --> LL exponent M78564151, [B]32.1[/B] ms/iter.


Restarting the program seems to fix the throughput.

VictordeHolland 2017-10-10 12:32

Is this with the new Prime95 version?

Mark Rose 2017-10-10 13:01

[QUOTE=VictordeHolland;469561]Is this with the new Prime95 version?[/QUOTE]

Yes, it has the Jacobi check.


bplenhart, what CPUs are you running?

bplenhart 2017-10-10 21:25

Machine 1: haswell i5-4570
Machine 2: haswell i5-4430

Prime95 v29.3.
I noticed this behavior in v28.10 also.

All of my machines will do this including i5-4690K and i5-6600K

Madpoo 2017-10-13 18:50

If I had to guess, after extended runtime the CPU is heating up and clock throttling. Then you stop it, the CPU gets a chance to cool off so when you start it again, it goes fast, then heats up, throttles, etc.

Use something like CPU-Z to look at the actual CPU clock rate while running and also when idle, and also note the rate when Prime95 first starts up compared to after it's been running a while and the iteration times are slower.

Note that your idle clock rates may be low thanks to power savings. If your BIOS has the option, you can set your system to run at full turbo speed all the time, but otherwise the default will slow the clock rate if nothing's happening.

On a few servers, I see this happening... they're all set for max performance but when Prime95 starts, it will throttle by a few turbo boosts due to thermal or TDP issues. On dual socket systems sometimes that means one socket runs faster than the other, or because of thermal interactions, starting up one worker makes that CPU run nice and fast, but firing up another worker using the 2nd socket will cause that first worker to slow down because its CPU got a little warmer and clocked down a bit.


All times are UTC. The time now is 18:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.