mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2015-09-23, 22:00   #1
dragonbud20
 
dragonbud20's Avatar
 
Mar 2014

34 Posts
Default running single tests fast

I'm trying to help out with the strategic double check project a bit and I want to figure out the best way to run 1 or two test at a time. currently I'm running 1 test on my 5930k which is 6 physical cores with hyperthreading to make 12 virtual cores; I have the worker windows setting with #workers set to 1, CPU affinity: "run on any CPU", CPUs to use:"12". Is this the best way to do it? is there any way to do it better? and I assume that if I'm running two tests I would want to set it to 2 workers and 6 CPUs each.
dragonbud20 is offline   Reply With Quote
Old 2015-09-24, 00:11   #2
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7·1,361 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
and I assume that if I'm running two tests I would want to set it to 2 workers and 6 CPUs each.
Nope... Hyperthreads don't help at all; slow things down in fact. Prime95/mprime is just too carefully hand-crafted assembly; you quickly get memory bandwidth bound.

The next critical thing is "real, on-chip" CPU affinity. Madpoo can speak to what is needed under Windows.

If you have more than one physical CPU, then (and only then) is when you go to multiple tests.

Last fiddled with by chalsall on 2015-09-24 at 00:11 Reason: Closed quote.
chalsall is offline   Reply With Quote
Old 2015-09-24, 02:27   #3
dragonbud20
 
dragonbud20's Avatar
 
Mar 2014

34 Posts
Default

Quote:
Originally Posted by chalsall View Post
Nope... Hyperthreads don't help at all; slow things down in fact. Prime95/mprime is just too carefully hand-crafted assembly; you quickly get memory bandwidth bound.

The next critical thing is "real, on-chip" CPU affinity. Madpoo can speak to what is needed under Windows.

If you have more than one physical CPU, then (and only then) is when you go to multiple tests.
hmm... so my default settings are to run 6 workers with each one using one physical core on my CPU is this actually sub-optimal? If this isn't the best way to go about doing tests what is on my particular CPU?
dragonbud20 is offline   Reply With Quote
Old 2015-09-24, 02:44   #4
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

5·947 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
hmm... so my default settings are to run 6 workers with each one using one physical core on my CPU is this actually sub-optimal? If this isn't the best way to go about doing tests what is on my particular CPU?
Experiment to find best overall production. It's memory-bound, but different CPU/memory combos have different best setups. Try 6x1, 3x2 (2 threads per test), 2x3 (3 threads per test), and see what produces the most work done. You also may find that first-time LL tests have a different optimal setup than DCs, as smaller FFTs tend to saturate memory less than larger ones.

You also may find that, say, 5 LL tests is as much production as 6 tests, allowing that 6th core to do something else from this forum- ECM in particular is not very memory-bandwidth intensive.
VBCurtis is offline   Reply With Quote
Old 2015-09-24, 02:50   #5
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7·1,361 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
IIf this isn't the best way to go about doing tests what is on my particular CPU?
It is all rather complicated I'm afraid. Just about every machine is different; L1 and L2 (possibly L3) caches; speed of memory; interleaved memory, other processing demands, etc, etc, etc...

The best way to find the "sweet spot" for optimal configuration is empirical testing. Try different settings, note the speeds achieved. Do the math (read: build a spreadsheet to record data). Rince and repeat (as necessary).

Prime95/mprime's benchmarking option helps a lot in this analysis, but I found that it ignored custom Affinity settings (at least under Linux) so I ended up running the same candidate dozens of times just to establish a baseline.

And, to put on the table, I /only/ do Linux. Again, Madpoo is a better man to speak about optimizing Windows.
chalsall is offline   Reply With Quote
Old 2015-09-24, 04:49   #6
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

33×347 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
I have the worker windows setting with #workers set to 1, CPU affinity: "run on any CPU", CPUs to use:"12". Is this the best way to do it? is there any way to do it better? and I assume that if I'm running two tests I would want to set it to 2 workers and 6 CPUs each.
A better way for one worker is "run on any CPU, CPUs to use: 6"
Accordingly, for two workers use "CPUs to use: 3"

The HT will only succeed to muddy the waters, without bringing any benefits. P95 is enough optimized to take advantage of the full core, without HT. HT is only for programs that don't use the core fully (having idle ticks), so splitting it in two gives other processes/tasks/programs the opportunity to use the free ticks. It is not the case for P95, in its case one of the logical core is waiting, because the other one uses all cycles, then after the first finishes it is waiting for the second, then some more time is used to synchronize them each-other, etc. So, HT does not bring any benefits. Contrarily.

Remark that I didn't say "the best way", but "a better way". The best way may be for you to reduce the number of cores more, if you have a slow memory, for example, or not enough channels. But for sure, using 6 cores with a single worker is a better way than using 12 cores. You can use Options/Benchmark from the P95 menu, compute the output speed and the time you need in each case, and see for yourself which way is better for your system.
LaurV is offline   Reply With Quote
Old 2015-09-25, 02:31   #7
dragonbud20
 
dragonbud20's Avatar
 
Mar 2014

34 Posts
Default

hmm so I've been doing a bit of testing and it seems that 12 CPUs is a 15% faster than 6CPUs the cause of this seems to be that P95 is only hitting about 50% CPU utilization any idea how to get better utilization? Also when running more than one worker what is the difference between smart assignment and run on any CPU both technically and performance wise?
dragonbud20 is offline   Reply With Quote
Old 2015-09-25, 05:14   #8
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

68210 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
hmm so I've been doing a bit of testing and it seems that 12 CPUs is a 15% faster than 6CPUs the cause of this seems to be that P95 is only hitting about 50% CPU utilization any idea how to get better utilization? Also when running more than one worker what is the difference between smart assignment and run on any CPU both technically and performance wise?
A 15% increase using hyperthreaded cores strongly indicates that the threads are being allocated less than optimally. That is, some of the threads are running on logical cores that share the same physical core.
For example, the 6C12T box I use to quickly check single exponents is setup like this:
Code:
CPU Affinity: CPU #1
CPUs to use: 6
AffinityScramble2=13579B02468A (in the local.txt file)
This forces Prime95 to use 1 unshared logical core per thread, and gives me the best throughput. Windows task manager shows +-50% CPU use with every other logical core at +-100%.

If I turn hyperthreading off in BIOS (and take out the AffinityScramble2 line), then task manager shows +-100% CPU use, and the iteration times don't change materially.

Last fiddled with by sdbardwick on 2015-09-25 at 05:21
sdbardwick is offline   Reply With Quote
Old 2015-09-25, 05:30   #9
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

249916 Posts
Default

Quote:
Originally Posted by dragonbud20 View Post
50% CPU utilization
That is a windows "bug", is not really a bug, but it is how windows count the cores, you only use 6 from 12 possible, so no matter wht you do with them, you will not be able to see more than 50% occupancy. Your CPU is used close to 100% in that case, don't worry. Windoze makes a distinction between "single physical core" and "two logical cores running on a single physical core", he sees that as two cores (which is arguable wrong, but that is the case). If you disable HT in bios, you will still get your 15% more speed, in 6 cores only, most probably. In fact, it depends of many other things, including what else you are running in the (other) logical cores in this time. But the result may also mean that you are not memory BW limited, which is good. You may also make tests and see if your 15% higher speed doesn't come with 70% higher heat and/or current consumption (which it is usually the case with HT, here the consensus is that for LL, HT is not good - but of course it is up to you, and all computers/systems are different).

Edit: Crosspost, I started to reply before the last post was made. We are in a divergent agreement, like someone here used to say, so I will let my post be.

Last fiddled with by LaurV on 2015-09-25 at 05:42
LaurV is offline   Reply With Quote
Old 2015-09-25, 06:41   #10
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

5·947 Posts
Default

I think you mean "violent agreement".

dragonbud should learn to not trust what windows tells him.
VBCurtis is offline   Reply With Quote
Old 2015-09-25, 06:49   #11
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

33·347 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
"violent agreement"
Thanks, you don't know how much I wanted to remember that expression, and I still forgot it! Grrrr... I am really getting older.
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How-to guide for running LL tests on the Amazon EC2 cloud GP2 Cloud Computing 52 2020-07-30 08:51
Fast and robust error checking on Proth/Pepin tests R. Gerbicz Number Theory Discussion Group 15 2018-09-01 13:23
Is it possible to disable benchmarking while torture tests are running? ZFR Software 4 2018-02-02 20:18
LL tests running at different speeds GARYP166 Information & Answers 11 2009-07-13 19:39
4 checkins in a single calendar month from a single computer Gary Edstrom Lounge 7 2003-01-13 22:35

All times are UTC. The time now is 18:05.

Thu Apr 15 18:05:38 UTC 2021 up 7 days, 12:46, 1 user, load averages: 2.32, 2.08, 2.00

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.