![]() |
|
|
#1 |
|
Sep 2002
89 Posts |
I was wondering if I decided to do a 100million LL what is the most cores I could use before just wasting cores say on a 16 core machine will using all 16 really tear up the work or is there no better return then using 8 cores or 4cores
|
|
|
|
|
|
#2 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
22·32·173 Posts |
This has an easy answer: A single core is the most efficient. So doing 16 separate tests on a 16-core machine in parallel will yield the most number of tests done per unit time. But this configuration will not yield the fastest time for a single LL test. To get a single test done as quickly as possible then using all 16-cores on a single exponent will most likely get you an answer in the shortest time. It is up to you how you wish to apply the diminishing returns valuation function before using more cores makes it pointless to save only a few minutes in overall wall-clock runtime.
Last fiddled with by retina on 2014-06-25 at 10:58 |
|
|
|
|
|
#3 | |
|
Mar 2011
Germany
9310 Posts |
Quote:
Unfortunately it is not that easy. In principle I agree with you that doing the tests single threaded is the most efficient. But for newer CPUs (Intel Core i7-2600, Core i7-4770, Xeon E5-2670) I observed the following behavior: doing some workers single threaded has a lower throughput than doing the workers multithreaded. But doing a single job with 20 threads is not faster than using 4 threads, there is a crossover point which is related to the memory bandwith and CPU. Even hyperthreading might help, but often it just slows everything down. So it has to be tested in the specific case. To find the largest throughput (assuming that you are testing many exponents of roughly the same size) you should do this with the internal benchmark. Put in prime.txt the following lines MinBenchFFT=<minimum size in KB> MaxBenchFFT=<maximum size in KB> BenchHyperthreads=1 BenchMultithreads=1 Replace the brackets with your wanted FFT size and run the benchmark test. Finally you choose the setting with the largest total throughput (iter/sec). I hope this helps a bit... |
|
|
|
|
|
|
#4 |
|
Jun 2003
5,087 Posts |
Not necessarily so. AVX(2) processors are _severely_ memory-bottlenecked. Running multiple tests in parallel can potentially step on each other's toes and decrease efficiency enough to the point that running fewer tests multi-threaded might still come out ahead. Benchmark your configuration.
EDIT:- It is also the case that, when multi-threading, the default FFT need not be the fastest. An higher FFT could potentially be faster. Last fiddled with by axn on 2014-06-25 at 14:47 |
|
|
|
|
|
#5 |
|
Sep 2009
32×233 Posts |
Would ECC memory make the bottleneck worse? How likely is a 100M test to complete successfully with/without ECC memory?
Chris |
|
|
|
|
|
#6 | |
|
May 2013
East. Always East.
172710 Posts |
Quote:
Every single DC I have done with this CPU has come up with a matching residue and I've been doing work on it for long enough to probably finish a 100 million digit test if two cores were working together, so in that respect, my CPU could probably pull it off fine without ECC memory. On the other hand, one single error in a one-year time frame would be enough to seriously doubt the integrity of a test that long, and I either haven't that once-per-year error yet or I am actually rock solid, but it's impossible to say. Any CPU that has gotten any errors should not be used. By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway. TL;DR: I don't know the numbers but I don't think it's worth taking the risk to run a 332M+ without ECC. Last fiddled with by TheMawn on 2014-06-25 at 19:30 |
|
|
|
|
|
|
#7 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
![]() Code:
[Jun 25 12:46] Worker starting [Jun 25 12:46] Setting affinity to run worker on any logical CPU. [Jun 25 12:46] Resuming primality test of M36032093 using FMA3 FFT length 1920K, Pass1=768, Pass2=2560 [Jun 25 12:46] Iteration: 2880392 / 36032093 [7.99%]. [Jun 25 12:46] Stopping primality test of M36032093 at iteration 2880848 [7.99%] [Jun 25 12:46] Worker stopped. [Jun 25 12:46] Worker starting [Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K [Jun 25 12:46] p: 332610809. Time: 91.753 ms. [Jun 25 12:46] p: 332610809. Time: 92.025 ms. [Jun 25 12:46] p: 332610809. Time: 92.532 ms. [Jun 25 12:46] p: 332610809. Time: 97.722 ms. [Jun 25 12:46] p: 332610809. Time: 94.215 ms. [Jun 25 12:46] p: 332610809. Time: 91.979 ms. [Jun 25 12:46] p: 332610809. Time: 93.627 ms. [Jun 25 12:46] p: 332610809. Time: 93.320 ms. [Jun 25 12:46] p: 332610809. Time: 92.159 ms. [Jun 25 12:46] p: 332610809. Time: 92.396 ms. [Jun 25 12:46] Iterations: 10. Total time: 0.932 sec. [Jun 25 12:46] Estimated time to complete this exponent: 358 days, 16 hours, 24 minutes. [Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 2 threads [Jun 25 12:46] p: 332610809. Time: 47.963 ms. [Jun 25 12:46] p: 332610809. Time: 47.947 ms. [Jun 25 12:46] p: 332610809. Time: 48.065 ms. [Jun 25 12:46] p: 332610809. Time: 48.333 ms. [Jun 25 12:46] p: 332610809. Time: 48.325 ms. [Jun 25 12:46] p: 332610809. Time: 48.028 ms. [Jun 25 12:46] p: 332610809. Time: 48.003 ms. [Jun 25 12:46] p: 332610809. Time: 48.373 ms. [Jun 25 12:46] p: 332610809. Time: 49.163 ms. [Jun 25 12:46] p: 332610809. Time: 48.811 ms. [Jun 25 12:46] Iterations: 10. Total time: 0.483 sec. [Jun 25 12:46] Estimated time to complete this exponent: 185 days, 22 hours, 37 minutes. [Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 3 threads [Jun 25 12:46] p: 332610809. Time: 34.743 ms. [Jun 25 12:46] p: 332610809. Time: 35.860 ms. [Jun 25 12:46] p: 332610809. Time: 35.324 ms. [Jun 25 12:46] p: 332610809. Time: 35.353 ms. [Jun 25 12:46] p: 332610809. Time: 35.379 ms. [Jun 25 12:46] p: 332610809. Time: 35.399 ms. [Jun 25 12:46] p: 332610809. Time: 36.720 ms. [Jun 25 12:46] p: 332610809. Time: 35.252 ms. [Jun 25 12:46] p: 332610809. Time: 35.280 ms. [Jun 25 12:46] p: 332610809. Time: 35.407 ms. [Jun 25 12:46] Iterations: 10. Total time: 0.355 sec. [Jun 25 12:46] Estimated time to complete this exponent: 136 days, 13 hours, 17 minutes. [Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 4 threads [Jun 25 12:46] p: 332610809. Time: 31.609 ms. [Jun 25 12:46] p: 332610809. Time: 31.396 ms. [Jun 25 12:46] p: 332610809. Time: 31.041 ms. [Jun 25 12:46] p: 332610809. Time: 31.134 ms. [Jun 25 12:46] p: 332610809. Time: 30.889 ms. [Jun 25 12:46] p: 332610809. Time: 31.028 ms. [Jun 25 12:46] p: 332610809. Time: 30.719 ms. [Jun 25 12:46] p: 332610809. Time: 30.670 ms. [Jun 25 12:46] p: 332610809. Time: 30.823 ms. [Jun 25 12:46] p: 332610809. Time: 30.896 ms. [Jun 25 12:46] Iterations: 10. Total time: 0.310 sec. [Jun 25 12:46] Estimated time to complete this exponent: 119 days, 10 hours, 2 minutes. [Jun 25 12:46] Worker stopped. |
|
|
|
|
|
|
#8 |
|
May 2013
East. Always East.
11×157 Posts |
No. I would venture a guess that it takes more than a year for a CPU to become trustworthy for a one-year test. A human has a much better capacity for "do it once, do it again better" than a computer. Just because a CPU has worked for one year flawlessly on individual 3-day tests doesn't mean it can or will work one year flawlessly on a one-year test.
If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled... |
|
|
|
|
|
#9 | |
|
"/X\(‘-‘)/X\"
Jan 2013
2×5×293 Posts |
Quote:
|
|
|
|
|
|
|
#10 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
87816 Posts |
Quote:
Don't overclock? Do you think CPU's are that unstable-prone at stock?
|
|
|
|
|
|
|
#11 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
22×32×173 Posts |
It doesn't matter how long you test it and experience no errors because you still can't trust it. It is not a matter of proving your system is perfect. It is a matter of hoping you don't fall victim to a stray alpha particle, or a stray cosmic ray, flipping a bit or two. In fact you may have already experienced this but the flipped bit(s) were in unused memory and thus had no affect. When running a single larger test you are using more memory and exposing yourself to more likelihood of having such flipped bits land within your test set. So even though you ran multiple DCs the data size is much smaller so you can't simply add up all the runtimes and claim it equivalent to a single large LL.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 32 cores limitation | gabrieltt | Software | 12 | 2010-07-15 10:26 |
| CPU cores | Unregistered | Information & Answers | 7 | 2009-11-02 08:27 |
| Running on 4 Cores | Unregistered | Information & Answers | 9 | 2008-09-25 00:53 |
| 6 Intel Cores | petrw1 | Hardware | 3 | 2008-09-16 16:33 |
| A program that uses all the CPU-cores | Primix | Hardware | 7 | 2008-09-06 21:09 |