mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Most efficent cores to use for 100million LL (https://www.mersenneforum.org/showthread.php?t=19450)

lpmurray 2014-06-25 10:49

Most efficent cores to use for 100million LL
 
I was wondering if I decided to do a 100million LL what is the most cores I could use before just wasting cores say on a 16 core machine will using all 16 really tear up the work or is there no better return then using 8 cores or 4cores

retina 2014-06-25 10:57

[QUOTE=lpmurray;376686]I was wondering if I decided to do a 100million LL what is the most cores I could use before just wasting cores say on a 16 core machine will using all 16 really tear up the work or is there no better return then using 8 cores or 4cores[/QUOTE]This has an easy answer: A single core is the most efficient. So doing 16 separate tests on a 16-core machine in parallel will yield the most number of tests done per unit time. But this configuration will not yield the fastest time for a single LL test. To get a single test done as quickly as possible then using all 16-cores on a single exponent will most likely get you an answer in the shortest time. It is up to you how you wish to apply the diminishing returns valuation function before using more cores makes it pointless to save only a few minutes in overall wall-clock runtime.

MrRepunit 2014-06-25 12:05

[QUOTE=retina;376687]This has an easy answer: A single core is the most efficient. So doing 16 separate tests on a 16-core machine in parallel will yield the most number of tests done per unit time. But this configuration will not yield the fastest time for a single LL test. To get a single test done as quickly as possible then using all 16-cores on a single exponent will most likely get you an answer in the shortest time. It is up to you how you wish to apply the diminishing returns valuation function before using more cores makes it pointless to save only a few minutes in overall wall-clock runtime.[/QUOTE]


Unfortunately it is not that easy. In principle I agree with you that doing the tests single threaded is the most efficient. But for newer CPUs (Intel Core i7-2600, Core i7-4770, Xeon E5-2670) I observed the following behavior: doing some workers single threaded has a lower throughput than doing the workers multithreaded. But doing a single job with 20 threads is not faster than using 4 threads, there is a crossover point which is related to the memory bandwith and CPU. Even hyperthreading might help, but often it just slows everything down. So it has to be tested in the specific case.

To find the largest throughput (assuming that you are testing many exponents of roughly the same size) you should do this with the internal benchmark. Put in prime.txt the following lines
MinBenchFFT=<minimum size in KB>
MaxBenchFFT=<maximum size in KB>
BenchHyperthreads=1
BenchMultithreads=1

Replace the brackets with your wanted FFT size and run the benchmark test. Finally you choose the setting with the largest total throughput (iter/sec).

I hope this helps a bit...

axn 2014-06-25 14:46

[QUOTE=retina;376687]This has an easy answer: A single core is the most efficient. [/QUOTE]

Not necessarily so. AVX(2) processors are _severely_ memory-bottlenecked. Running multiple tests in parallel can potentially step on each other's toes and decrease efficiency enough to the point that running fewer tests multi-threaded might still come out ahead. Benchmark your configuration.

EDIT:- It is also the case that, when multi-threading, the default FFT need not be the fastest. An higher FFT could potentially be faster.

chris2be8 2014-06-25 15:49

Would ECC memory make the bottleneck worse? How likely is a 100M test to complete successfully with/without ECC memory?

Chris

TheMawn 2014-06-25 19:29

[QUOTE=chris2be8;376707]Would ECC memory make the bottleneck worse? How likely is a 100M test to complete successfully with/without ECC memory?

Chris[/QUOTE]

ECC memory is slower in general so it would make an existing bottleneck worse. I would think ECC memory is a must for any test starting to take months rather than weeks.

Every single DC I have done with this CPU has come up with a matching residue and I've been doing work on it for long enough to probably finish a 100 million digit test if two cores were working together, so in that respect, my CPU could probably pull it off fine without ECC memory.

On the other hand, one single error in a one-year time frame would be enough to seriously doubt the integrity of a test that long, and I either haven't that once-per-year error yet or I am actually rock solid, but it's impossible to say. Any CPU that [B]has[/B] gotten any errors should not be used.

By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.

TL;DR: I don't know the numbers but I don't think it's worth taking the risk to run a 332M+ without ECC.

kracker 2014-06-25 19:47

[QUOTE=TheMawn;376725]
By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.[/QUOTE]

:huh:

[code]
[Jun 25 12:46] Worker starting
[Jun 25 12:46] Setting affinity to run worker on any logical CPU.
[Jun 25 12:46] Resuming primality test of M36032093 using FMA3 FFT length 1920K, Pass1=768, Pass2=2560
[Jun 25 12:46] Iteration: 2880392 / 36032093 [7.99%].
[Jun 25 12:46] Stopping primality test of M36032093 at iteration 2880848 [7.99%]
[Jun 25 12:46] Worker stopped.
[Jun 25 12:46] Worker starting
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K
[Jun 25 12:46] p: 332610809. Time: 91.753 ms.
[Jun 25 12:46] p: 332610809. Time: 92.025 ms.
[Jun 25 12:46] p: 332610809. Time: 92.532 ms.
[Jun 25 12:46] p: 332610809. Time: 97.722 ms.
[Jun 25 12:46] p: 332610809. Time: 94.215 ms.
[Jun 25 12:46] p: 332610809. Time: 91.979 ms.
[Jun 25 12:46] p: 332610809. Time: 93.627 ms.
[Jun 25 12:46] p: 332610809. Time: 93.320 ms.
[Jun 25 12:46] p: 332610809. Time: 92.159 ms.
[Jun 25 12:46] p: 332610809. Time: 92.396 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.932 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 358 days, 16 hours, 24 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 2 threads
[Jun 25 12:46] p: 332610809. Time: 47.963 ms.
[Jun 25 12:46] p: 332610809. Time: 47.947 ms.
[Jun 25 12:46] p: 332610809. Time: 48.065 ms.
[Jun 25 12:46] p: 332610809. Time: 48.333 ms.
[Jun 25 12:46] p: 332610809. Time: 48.325 ms.
[Jun 25 12:46] p: 332610809. Time: 48.028 ms.
[Jun 25 12:46] p: 332610809. Time: 48.003 ms.
[Jun 25 12:46] p: 332610809. Time: 48.373 ms.
[Jun 25 12:46] p: 332610809. Time: 49.163 ms.
[Jun 25 12:46] p: 332610809. Time: 48.811 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.483 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 185 days, 22 hours, 37 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 3 threads
[Jun 25 12:46] p: 332610809. Time: 34.743 ms.
[Jun 25 12:46] p: 332610809. Time: 35.860 ms.
[Jun 25 12:46] p: 332610809. Time: 35.324 ms.
[Jun 25 12:46] p: 332610809. Time: 35.353 ms.
[Jun 25 12:46] p: 332610809. Time: 35.379 ms.
[Jun 25 12:46] p: 332610809. Time: 35.399 ms.
[Jun 25 12:46] p: 332610809. Time: 36.720 ms.
[Jun 25 12:46] p: 332610809. Time: 35.252 ms.
[Jun 25 12:46] p: 332610809. Time: 35.280 ms.
[Jun 25 12:46] p: 332610809. Time: 35.407 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.355 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 136 days, 13 hours, 17 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 4 threads
[Jun 25 12:46] p: 332610809. Time: 31.609 ms.
[Jun 25 12:46] p: 332610809. Time: 31.396 ms.
[Jun 25 12:46] p: 332610809. Time: 31.041 ms.
[Jun 25 12:46] p: 332610809. Time: 31.134 ms.
[Jun 25 12:46] p: 332610809. Time: 30.889 ms.
[Jun 25 12:46] p: 332610809. Time: 31.028 ms.
[Jun 25 12:46] p: 332610809. Time: 30.719 ms.
[Jun 25 12:46] p: 332610809. Time: 30.670 ms.
[Jun 25 12:46] p: 332610809. Time: 30.823 ms.
[Jun 25 12:46] p: 332610809. Time: 30.896 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.310 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 119 days, 10 hours, 2 minutes.
[Jun 25 12:46] Worker stopped.
[/code]

In one year CPU speeds double?

TheMawn 2014-06-25 21:57

No. I would venture a guess that it takes more than a year for a CPU to become trustworthy for a one-year test. A human has a much better capacity for "do it once, do it again better" than a computer. Just because a CPU has worked for one year flawlessly on individual 3-day tests doesn't mean it can or will work one year flawlessly on a one-year test.

If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...

Mark Rose 2014-06-25 22:32

[QUOTE=TheMawn;376734]If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...[/QUOTE]

That's actually unlikely. Cosmic rays flip bits. [url]http://hardware.slashdot.org/story/13/11/22/159212/elevation-plays-a-role-in-memory-error-rates[/url]

kracker 2014-06-25 22:50

[QUOTE=TheMawn;376734]No. I would venture a guess that it takes more than a year for a CPU to become trustworthy for a one-year test. A human has a much better capacity for "do it once, do it again better" than a computer. Just because a CPU has worked for one year flawlessly on individual 3-day tests doesn't mean it can or will work one year flawlessly on a one-year test.

If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...[/QUOTE]

:huh: Don't overclock? Do you think CPU's are that unstable-prone at stock?

retina 2014-06-25 23:36

[QUOTE=TheMawn;376725]By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.[/QUOTE]It doesn't matter how long you test it and experience no errors because you still can't trust it. It is not a matter of proving your system is perfect. It is a matter of hoping you don't fall victim to a stray alpha particle, or a stray cosmic ray, flipping a bit or two. In fact you may have already experienced this but the flipped bit(s) were in unused memory and thus had no affect. When running a single larger test you are using more memory and exposing yourself to more likelihood of having such flipped bits land within your test set. So even though you ran multiple DCs the data size is much smaller so you can't simply add up all the runtimes and claim it equivalent to a single large LL.


All times are UTC. The time now is 22:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.