mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2015-10-21, 20:05   #1
CuriousKit
 
"J. Gareth Moreton"
Feb 2015
Nomadic

9010 Posts
Question More efficient to reduce worker count?

Hi everyone,

I have a quad-core Intel Core i7-4770 running at 3.40GHz running the 64-bit version of Windows 7, and with 4 worker threads running, I'm currently averaging about 24 to 25 ms/iteration for first-time checks (in the 60m range), and about 14 ms/iteration for double checks (in the 35m range). If I turn off one of the worker threads, these times drop to 20 ms/iteration for first-time checks and 11 ms/iteration for double checks. Of course, I'm still learning the quirks of the thread scheduler, but for a computer that is used as a work machine as well, this seems to be quite a boost in efficiency, although I'm not entirely certain if it equates or surpasses having an extra worker thread.

What are people's experience with this?
CuriousKit is offline   Reply With Quote
Old 2015-10-21, 21:20   #2
aurashift
 
Jan 2015

3758 Posts
Default

Don't have time to post an explanation, but being very succinct: google interrupts, hyperthreading, and turbo boost.
aurashift is offline   Reply With Quote
Old 2015-10-21, 21:24   #3
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×5×7×67 Posts
Default

Quote:
Originally Posted by CuriousKit View Post
Hi everyone,

I have a quad-core Intel Core i7-4770 running at 3.40GHz running the 64-bit version of Windows 7, and with 4 worker threads running, I'm currently averaging about 24 to 25 ms/iteration for first-time checks (in the 60m range), and about 14 ms/iteration for double checks (in the 35m range). If I turn off one of the worker threads, these times drop to 20 ms/iteration for first-time checks and 11 ms/iteration for double checks. Of course, I'm still learning the quirks of the thread scheduler, but for a computer that is used as a work machine as well, this seems to be quite a boost in efficiency, although I'm not entirely certain if it equates or surpasses having an extra worker thread.

What are people's experience with this?
Unless you are seriously RAM constrained as my only 4770 is stopping a worker or two will "slightly" improve iteration times as you note; though 25 - 20 seems a little more drop than I would expect. You will still get more total thruput with 4 cores at 25Ms than 3 at 20Ms.

And don't worry about the machine doing "real" work too. Prime95 runs at the lowest priority and unless you are using memory intensive work like P-1; or Large ECM you will NOT notice any impact on your work.
petrw1 is offline   Reply With Quote
Old 2015-10-21, 21:25   #4
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×5×7×67 Posts
Default

Quote:
Originally Posted by aurashift View Post
Don't have time to post an explanation, but being very succinct: google interrupts, hyperthreading, and turbo boost.
Don't use Hyper-threading in Prime95.
If you have 4 Physical cores only run 4 Prime95 Workers (not the Hyper-threading proposes).
petrw1 is offline   Reply With Quote
Old 2015-10-21, 23:50   #5
aurashift
 
Jan 2015

111111012 Posts
Default

Quote:
Originally Posted by petrw1 View Post
Don't use Hyper-threading in Prime95.
If you have 4 Physical cores only run 4 Prime95 Workers (not the Hyper-threading proposes).
I didn't propose that he use HT, just to look it up. If you don't leave a physical core free, you'll be interrupting p95 so that your computer can do its everyday tasks which will be threaded in a logical core. I'd recommend trying 'intel power gadget' a try so you can see how turbo boost, heat, and load will affect your CPU performance in real time.
aurashift is offline   Reply With Quote
Old 2015-10-22, 03:39   #6
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

2×5×293 Posts
Default

I also have a 4770 and I have the same experience, with hyperthreading enabled.

3 cores slightly outperforms 4.
Mark Rose is offline   Reply With Quote
Old 2015-10-22, 03:44   #7
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by CuriousKit View Post
Hi everyone,

I have a quad-core Intel Core i7-4770 running at 3.40GHz running the 64-bit version of Windows 7, and with 4 worker threads running, I'm currently averaging about 24 to 25 ms/iteration for first-time checks (in the 60m range), and about 14 ms/iteration for double checks (in the 35m range). If I turn off one of the worker threads, these times drop to 20 ms/iteration for first-time checks and 11 ms/iteration for double checks. Of course, I'm still learning the quirks of the thread scheduler, but for a computer that is used as a work machine as well, this seems to be quite a boost in efficiency, although I'm not entirely certain if it equates or surpasses having an extra worker thread.

What are people's experience with this?
Besides the advice others gave, here's my experience with it.

For me, I would run two workers, each one using all the cores on it's own CPU (it's a dual CPU system).

I experienced exactly what you're describing if one of the two workers was doing an LL test on any exponent > 58M. If that's the case, the other worker would need to be doing a DC on something below 38M otherwise they would both slow down a lot.

My theory had to do with memory contention or something like that, but whatever... I just know to either keep both workers doing something below 58M, or if I must do something larger, set the other one to a small double-check task.

I recently set some of my "lower end" systems to do a single core per worker, which means I have some with 8 or 12 workers going. I've found that the same basic rule applies... if any single worker is doing a test > 58M then it'll slow down anything else doing something > 38M. As long as I have all of them doing small work (double checks below 50M or so) I don't have to worry about it.

There is an exception to this, and that's on my newer system with dual Xeon E5 2697 v3 and DDR4 RAM. Not sure if it's the larger L1/L2/L3 caching or the faster DDR4 or just something else, but that > 58M limit doesn't apply.

Right now I've got it set for two workers (using all the cores on either CPU)... one is doing a 100M test and the other a 60M and they don't interfere with each other. I haven't really pushed it yet to see what the limit is for it... if I ever have the need I'll experiment to see where it winds up.
Madpoo is offline   Reply With Quote
Old 2015-10-22, 04:00   #8
axn
 
axn's Avatar
 
Jun 2003

2×3×7×112 Posts
Default

Quote:
Originally Posted by CuriousKit View Post
I have a quad-core Intel Core i7-4770 running at 3.40GHz running the 64-bit version of Windows 7, and with 4 worker threads running, I'm currently averaging about 24 to 25 ms/iteration for first-time checks (in the 60m range), and about 14 ms/iteration for double checks (in the 35m range). If I turn off one of the worker threads, these times drop to 20 ms/iteration for first-time checks and 11 ms/iteration for double checks. Of course, I'm still learning the quirks of the thread scheduler, but for a computer that is used as a work machine as well, this seems to be quite a boost in efficiency, although I'm not entirely certain if it equates or surpasses having an extra worker thread.
24-25 ms/iter x 4 workers = 160-166 iters/sec thruput
20 ms/iter x 3 workers = 150 iters/sec thruput.

14 ms/iter x 4 workers = 285 iters/sec thruput
11 ms/iter x 3 workers = 272 iters/sec thruput

Suggesting that you're very nearly memory bottlenecked, but 4 workers is still better than 3 workers in terms of total productivity.

What is your RAM spec?
axn is offline   Reply With Quote
Old 2015-10-22, 15:03   #9
CuriousKit
 
"J. Gareth Moreton"
Feb 2015
Nomadic

2·32·5 Posts
Default

I have 16 GB of RAM on the machine in question.
CuriousKit is offline   Reply With Quote
Old 2015-10-22, 15:11   #10
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by CuriousKit View Post
I have 16 GB of RAM on the machine in question.
Dual channel? As in, pair(s) of "sticks"? I assume so based on the amount of RAM you have, but it's an important variable.

Also, what speed?

Another thing which is absolutely critical in optimizing Prime95/mprime is to get the affinity correct. At least under Linux (mprime) I have found that without explicitly setting the affinity manually (via the AffinityScramble2 parameter in local.txt) the processes can often jump around, sometimes ending up with hyperthreaded "virtual" cores being used.
chalsall is online now   Reply With Quote
Old 2015-10-22, 15:50   #11
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×5×7×67 Posts
Default

Quote:
Originally Posted by aurashift View Post
I didn't propose that he use HT, just to look it up.
Yes I understood it that way....maybe I was quick to simply say DON'T...but I have seen VERY VERY little commentary to the contrary in this forum.
petrw1 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
reduce to 108119486 relation sets and 0 unique ideals Alfred Msieve 2 2017-04-02 07:01
Worker #5 and Worker#7 not running (Error ILLEGAL SUMOUT skrupian08 Information & Answers 9 2016-08-23 16:35
How to reduce number of worker windows? Chuck PrimeNet 7 2011-07-03 19:17
Reduce your debt!! ... I'm curious. petrw1 Lounge 59 2009-01-21 12:48
Any way to reduce CPU usage? Jarl Hardware 5 2007-03-30 19:13

All times are UTC. The time now is 20:00.


Sun Aug 1 20:00:53 UTC 2021 up 9 days, 14:29, 0 users, load averages: 1.93, 1.50, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.