![]() |
|
|
#12 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
I'm only thinking two or three parallel tests.
An int is 4 bytes -- two, or three, even ten, is 40 bytes. 40 bytes takes no longer to transfer than 4 bytes. FFTlen*8bytes, for a typical current double check exponent, is around 12.6 MB. I imagine 126 MB would take perhaps a few ms longer to transfer, but keep in mind that (on my 460) each iteration itself is only around 6-7 ms... But then, such a heavy transfer would only occur at checkpoint iterations... I suppose it is doable... but there are a lot of other challenges that would have to be solved... |
|
|
|
|
|
#13 | |
|
Dec 2011
11·13 Posts |
Quote:
The key thing, I think, is that CUDALucas uses the NVIDIA-provided FFT routines. I believe NVIDIA is very proud of their FFT routines, and I believe they have optimized their FFT routines to make good use of the parallel hardware. The data at http://mersenne-aries.sili.net/cudalucas.php shows that the performance scales well with the amount of parallel hardware on a card. Despite some of their marketing literature, NVIDIA's cards aren't magic. Running 1000 Mersenne numbers in parallel isn't going to yield 1000x throughput of an algorithm that already exploits the parallelism of the hardware. |
|
|
|
|
|
|
#14 | |
|
Romulan Interpreter
Jun 2011
Thailand
966410 Posts |
Quote:
. Some speed increase you can get if the expo is small (FFT size small) and the card is not maxed with a single instance. In this case (same as for mfaktc, where more instances are always necessary) you can get some more output running more instances, but the additional output come from additional occupancy of the card. When the card is 97-100% occupied, there is no way to increase the output. Next step to increase the output, if you really want it, would be overclocking the card, but that is generally not profitable and not advisable (without liquid cooling, cheap or free electricity, etc). Overclocking results in little increase of the output, and huge increase of the consumed power, as it was discussed many times around here. The stability of the system decrease without good/expensive cooling equipment, and if you get a mismatch from time to time, all your increased output was in vain. If you increase your output by 10%, but get one bad residue in 10, then per total you gained nothing, except a 30% higher electricity bill for all the period. Don't ask how we know that... Edit: numeric example: GTX580 clocked at 782MHz: LLDC for a 26M expo, one instance, it takes about 17 hours, the card is 97% busy. Two tests in serial (one after the other) would take about 34 hours. All times were rounded in addition, the card is a bit faster in these initial conditions. Running both tests in parallel, two different CL instances will give an average ETA of 36 hours for the two tests to finish (average because usually one test finishes a bit faster, resulting in "dead times" between the two instances (assuming you have equal worktodo files), more waste of time. Last fiddled with by LaurV on 2012-08-14 at 06:00 |
|
|
|
|
|
|
#15 | |
|
Romulan Interpreter
Jun 2011
Thailand
26×151 Posts |
Quote:
Last fiddled with by LaurV on 2012-08-14 at 07:37 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mersenne Primes p which are in a set of twin primes is finite? | carpetpool | Miscellaneous Math | 3 | 2017-08-10 13:47 |
| Distribution of Mersenne primes before and after couples of primes found | emily | Math | 34 | 2017-07-16 18:44 |
| Gaussian-Mersenne & Eisenstein-Mersenne primes | siegert81 | Math | 2 | 2011-09-19 17:36 |
| A conjecture about Mersenne primes and non-primes | Unregistered | Information & Answers | 0 | 2011-01-31 15:41 |
| Mersenne Wiki: Improving the mersenne primes web site by FOSS methods | optim | PrimeNet | 13 | 2004-07-09 13:51 |