mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Most efficent cores to use for 100million LL (https://www.mersenneforum.org/showthread.php?t=19450)

lpmurray 2014-06-25 10:49

Most efficent cores to use for 100million LL
 
I was wondering if I decided to do a 100million LL what is the most cores I could use before just wasting cores say on a 16 core machine will using all 16 really tear up the work or is there no better return then using 8 cores or 4cores

retina 2014-06-25 10:57

[QUOTE=lpmurray;376686]I was wondering if I decided to do a 100million LL what is the most cores I could use before just wasting cores say on a 16 core machine will using all 16 really tear up the work or is there no better return then using 8 cores or 4cores[/QUOTE]This has an easy answer: A single core is the most efficient. So doing 16 separate tests on a 16-core machine in parallel will yield the most number of tests done per unit time. But this configuration will not yield the fastest time for a single LL test. To get a single test done as quickly as possible then using all 16-cores on a single exponent will most likely get you an answer in the shortest time. It is up to you how you wish to apply the diminishing returns valuation function before using more cores makes it pointless to save only a few minutes in overall wall-clock runtime.

MrRepunit 2014-06-25 12:05

[QUOTE=retina;376687]This has an easy answer: A single core is the most efficient. So doing 16 separate tests on a 16-core machine in parallel will yield the most number of tests done per unit time. But this configuration will not yield the fastest time for a single LL test. To get a single test done as quickly as possible then using all 16-cores on a single exponent will most likely get you an answer in the shortest time. It is up to you how you wish to apply the diminishing returns valuation function before using more cores makes it pointless to save only a few minutes in overall wall-clock runtime.[/QUOTE]


Unfortunately it is not that easy. In principle I agree with you that doing the tests single threaded is the most efficient. But for newer CPUs (Intel Core i7-2600, Core i7-4770, Xeon E5-2670) I observed the following behavior: doing some workers single threaded has a lower throughput than doing the workers multithreaded. But doing a single job with 20 threads is not faster than using 4 threads, there is a crossover point which is related to the memory bandwith and CPU. Even hyperthreading might help, but often it just slows everything down. So it has to be tested in the specific case.

To find the largest throughput (assuming that you are testing many exponents of roughly the same size) you should do this with the internal benchmark. Put in prime.txt the following lines
MinBenchFFT=<minimum size in KB>
MaxBenchFFT=<maximum size in KB>
BenchHyperthreads=1
BenchMultithreads=1

Replace the brackets with your wanted FFT size and run the benchmark test. Finally you choose the setting with the largest total throughput (iter/sec).

I hope this helps a bit...

axn 2014-06-25 14:46

[QUOTE=retina;376687]This has an easy answer: A single core is the most efficient. [/QUOTE]

Not necessarily so. AVX(2) processors are _severely_ memory-bottlenecked. Running multiple tests in parallel can potentially step on each other's toes and decrease efficiency enough to the point that running fewer tests multi-threaded might still come out ahead. Benchmark your configuration.

EDIT:- It is also the case that, when multi-threading, the default FFT need not be the fastest. An higher FFT could potentially be faster.

chris2be8 2014-06-25 15:49

Would ECC memory make the bottleneck worse? How likely is a 100M test to complete successfully with/without ECC memory?

Chris

TheMawn 2014-06-25 19:29

[QUOTE=chris2be8;376707]Would ECC memory make the bottleneck worse? How likely is a 100M test to complete successfully with/without ECC memory?

Chris[/QUOTE]

ECC memory is slower in general so it would make an existing bottleneck worse. I would think ECC memory is a must for any test starting to take months rather than weeks.

Every single DC I have done with this CPU has come up with a matching residue and I've been doing work on it for long enough to probably finish a 100 million digit test if two cores were working together, so in that respect, my CPU could probably pull it off fine without ECC memory.

On the other hand, one single error in a one-year time frame would be enough to seriously doubt the integrity of a test that long, and I either haven't that once-per-year error yet or I am actually rock solid, but it's impossible to say. Any CPU that [B]has[/B] gotten any errors should not be used.

By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.

TL;DR: I don't know the numbers but I don't think it's worth taking the risk to run a 332M+ without ECC.

kracker 2014-06-25 19:47

[QUOTE=TheMawn;376725]
By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.[/QUOTE]

:huh:

[code]
[Jun 25 12:46] Worker starting
[Jun 25 12:46] Setting affinity to run worker on any logical CPU.
[Jun 25 12:46] Resuming primality test of M36032093 using FMA3 FFT length 1920K, Pass1=768, Pass2=2560
[Jun 25 12:46] Iteration: 2880392 / 36032093 [7.99%].
[Jun 25 12:46] Stopping primality test of M36032093 at iteration 2880848 [7.99%]
[Jun 25 12:46] Worker stopped.
[Jun 25 12:46] Worker starting
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K
[Jun 25 12:46] p: 332610809. Time: 91.753 ms.
[Jun 25 12:46] p: 332610809. Time: 92.025 ms.
[Jun 25 12:46] p: 332610809. Time: 92.532 ms.
[Jun 25 12:46] p: 332610809. Time: 97.722 ms.
[Jun 25 12:46] p: 332610809. Time: 94.215 ms.
[Jun 25 12:46] p: 332610809. Time: 91.979 ms.
[Jun 25 12:46] p: 332610809. Time: 93.627 ms.
[Jun 25 12:46] p: 332610809. Time: 93.320 ms.
[Jun 25 12:46] p: 332610809. Time: 92.159 ms.
[Jun 25 12:46] p: 332610809. Time: 92.396 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.932 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 358 days, 16 hours, 24 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 2 threads
[Jun 25 12:46] p: 332610809. Time: 47.963 ms.
[Jun 25 12:46] p: 332610809. Time: 47.947 ms.
[Jun 25 12:46] p: 332610809. Time: 48.065 ms.
[Jun 25 12:46] p: 332610809. Time: 48.333 ms.
[Jun 25 12:46] p: 332610809. Time: 48.325 ms.
[Jun 25 12:46] p: 332610809. Time: 48.028 ms.
[Jun 25 12:46] p: 332610809. Time: 48.003 ms.
[Jun 25 12:46] p: 332610809. Time: 48.373 ms.
[Jun 25 12:46] p: 332610809. Time: 49.163 ms.
[Jun 25 12:46] p: 332610809. Time: 48.811 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.483 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 185 days, 22 hours, 37 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 3 threads
[Jun 25 12:46] p: 332610809. Time: 34.743 ms.
[Jun 25 12:46] p: 332610809. Time: 35.860 ms.
[Jun 25 12:46] p: 332610809. Time: 35.324 ms.
[Jun 25 12:46] p: 332610809. Time: 35.353 ms.
[Jun 25 12:46] p: 332610809. Time: 35.379 ms.
[Jun 25 12:46] p: 332610809. Time: 35.399 ms.
[Jun 25 12:46] p: 332610809. Time: 36.720 ms.
[Jun 25 12:46] p: 332610809. Time: 35.252 ms.
[Jun 25 12:46] p: 332610809. Time: 35.280 ms.
[Jun 25 12:46] p: 332610809. Time: 35.407 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.355 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 136 days, 13 hours, 17 minutes.
[Jun 25 12:46] Using FMA3 FFT length 18M, Pass1=1536, Pass2=12K, 4 threads
[Jun 25 12:46] p: 332610809. Time: 31.609 ms.
[Jun 25 12:46] p: 332610809. Time: 31.396 ms.
[Jun 25 12:46] p: 332610809. Time: 31.041 ms.
[Jun 25 12:46] p: 332610809. Time: 31.134 ms.
[Jun 25 12:46] p: 332610809. Time: 30.889 ms.
[Jun 25 12:46] p: 332610809. Time: 31.028 ms.
[Jun 25 12:46] p: 332610809. Time: 30.719 ms.
[Jun 25 12:46] p: 332610809. Time: 30.670 ms.
[Jun 25 12:46] p: 332610809. Time: 30.823 ms.
[Jun 25 12:46] p: 332610809. Time: 30.896 ms.
[Jun 25 12:46] Iterations: 10. Total time: 0.310 sec.
[Jun 25 12:46] Estimated time to complete this exponent: 119 days, 10 hours, 2 minutes.
[Jun 25 12:46] Worker stopped.
[/code]

In one year CPU speeds double?

TheMawn 2014-06-25 21:57

No. I would venture a guess that it takes more than a year for a CPU to become trustworthy for a one-year test. A human has a much better capacity for "do it once, do it again better" than a computer. Just because a CPU has worked for one year flawlessly on individual 3-day tests doesn't mean it can or will work one year flawlessly on a one-year test.

If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...

Mark Rose 2014-06-25 22:32

[QUOTE=TheMawn;376734]If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...[/QUOTE]

That's actually unlikely. Cosmic rays flip bits. [url]http://hardware.slashdot.org/story/13/11/22/159212/elevation-plays-a-role-in-memory-error-rates[/url]

kracker 2014-06-25 22:50

[QUOTE=TheMawn;376734]No. I would venture a guess that it takes more than a year for a CPU to become trustworthy for a one-year test. A human has a much better capacity for "do it once, do it again better" than a computer. Just because a CPU has worked for one year flawlessly on individual 3-day tests doesn't mean it can or will work one year flawlessly on a one-year test.

If a CPU goes for three or four years without making a single error, I might then begin to trust its ability to complete a year-long project. By then, however, I should certainly hope CPU's have doubled...[/QUOTE]

:huh: Don't overclock? Do you think CPU's are that unstable-prone at stock?

retina 2014-06-25 23:36

[QUOTE=TheMawn;376725]By the time a CPU has been field tested long enough to be trusted with something like that, a CPU which could do the test in half the time and more reliably has already been released anyway.[/QUOTE]It doesn't matter how long you test it and experience no errors because you still can't trust it. It is not a matter of proving your system is perfect. It is a matter of hoping you don't fall victim to a stray alpha particle, or a stray cosmic ray, flipping a bit or two. In fact you may have already experienced this but the flipped bit(s) were in unused memory and thus had no affect. When running a single larger test you are using more memory and exposing yourself to more likelihood of having such flipped bits land within your test set. So even though you ran multiple DCs the data size is much smaller so you can't simply add up all the runtimes and claim it equivalent to a single large LL.

Batalov 2014-06-26 00:06

[QUOTE=retina;376745]In fact you may have already experienced this but the flipped bit(s) were in unused memory and thus had no affect.[/QUOTE]
Aw, the famous black cat in the pitch black room argument! Love it! :rajula:

TheMawn 2014-06-26 00:14

I'm saying that there is always that unknown chance for a single bit error. Overclocking has nothing to do with this.

What I am trying to say is you can't ever guarantee that a mistake will not be made. You can only become reasonably confident, and that takes time.


Suppose you do twelve months of DC's to stability test your system, and you got zero errors. All you can really say is your odds of failure for one month are [B]less than[/B] one in twelve, if you consider each month to be statistically significant. In other words, your odds of succeeding for one month are [B]greater than[/B] 91.7%.

Now, [Greater than 0.917][SUP]12[/SUP] = [Greater than 35.2%]. These are your odds of getting twelve consecutive successes.

If I tell you that your odds of having a successful LL are better than a third, are you really inspired with confidence?


"Now wait, I might have gone twelve months without an error, but did I not also go for 30 million seconds? My odds of failing any particular second are minuscule, so doesn't the interval length matter?"

Of course it does. However, this boils down to something very familiar. Some people reading this might already know where I'm going.

(1 - 1/30,000,000)[SUP]30,000,000[/SUP] = 36.79%. Your odds of not failing an iteration for 30,000,000 seconds are still only better than a third. Note that they could be anywhere above a third, including 100.000%. But that's all we can say for sure.

You can keep making your results better by taking smaller intervals, but your odds of success are simply tending to e[SUP]-1[/SUP].


Now, if you go for 24 months without an error, your odds of success per month are > 95.8%. Raise that to 12 (you're only doing a one-year test) and you get > 60%. Want to try seconds? You should get > 60.6%.


If you want a formula, your odds of a successful year-long test after x years without a failure are only greater than e[SUP]-1/x[/SUP].

60.6% after two years.
81.8% after five years.
99.0% after 100 years.


EDIT: I started writing this and was interrupted, and there were some replies in the mean time. Retina of course makes a good point: The larger tests have more chances of failure per length of time by themselves. So really, your chances of succeeding a one-year test after two years of testing are less than greater than 60.6% :razz:

kladner 2014-06-26 01:23

Just to backtrack-
[QUOTE]Every single DC I have done with this CPU has come up with a matching residue[/QUOTE]This is shear luck. You have DC'd only correct first time tests. (This is much easier to accept than the idea that you and the First Tester(s) independently arrived at the same wrong residue.) Not matching the residue does not prove one way or the other about the correctness of one's machine's performance.

potonono 2014-06-26 03:09

I protect my calculations from cosmic rays with a thick layer of dust over all the internal PC components. :smile:

retina 2014-06-26 03:14

[QUOTE=potonono;376770]I protect my calculations from cosmic rays with a thick layer of dust over all the internal PC components. :smile:[/QUOTE]Except that your layer of dust has impurities and emits alpha particles thus nullifying the effect you seek to obtain. :razz:

TheMawn 2014-06-26 03:21

What I am trying laboriously to say, though certain individuals persist in trying to misunderstand me, is that as a personal example, despite my CPU having as clean a record as possible over a year, I would not trust it with a 100 million digit LL.

kracker 2014-06-26 15:17

[QUOTE=TheMawn;376773]What I am trying laboriously to say, though certain individuals persist in trying to misunderstand me, is that as a personal example, despite my CPU having as clean a record as possible over a year, I would not trust it with a 100 million digit LL.[/QUOTE]

I think you're misunderstanding us. Who says/why do we have to test our CPU's for over a year? Simple, stay at stock. Do you really think CPU's are that error prone? A 332M test takes around 120 days on a Haswell with 4 threads.
Maybe I am missing something.

retina 2014-06-26 15:31

[QUOTE=kracker;376799]Maybe I am missing something.[/QUOTE]ECC.

kladner 2014-06-27 01:28

[QUOTE=retina;376800]ECC.[/QUOTE]

The longer the test, the more you need it.


All times are UTC. The time now is 22:27.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.