![]() |
|
|
#419 | |
|
Sep 2010
Annapolis, MD, USA
3068 Posts |
Quote:
|
|
|
|
|
|
|
#420 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
10B516 Posts |
Quote:
|
|
|
|
|
|
|
#421 |
|
Sep 2008
Kansas
59×67 Posts |
It seems the wavefront for TF is currently in the M80mil, while the wavefront for both P-1 and LL is about M53mil. Would it make sense to get the fine fellas over at the GPU farm to help with some M54-M70mil and run the TF to one (or two) bit levels higher than needed to off-set the P-1 work (at least, on a temporary basis)?
Yes, I know there is another TF level after P-1 but what about having them run an extra one or two levels? It takes me nearly three days to complete a P-1. It seems it takes the GPU program a matter of hours to complete the next bit level. Would completing an extra bit level be comparable to a good P-1? I admit, I don't know the math behind all of this. |
|
|
|
|
|
#422 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
10000101101012 Posts |
No. But an extra 4-5 levels might be.
A good P-1 is normally somewhere in the 5-7% chance-of-finding-a-factor range. Assuming that M53xxxxxx is normally TF'd to 2^68, if you take it an extra 4 levels to 2^72 you'll have a 5.75% chance of finding a factor for about 17GHz-days worth of work. An equivalent P-1 requires only 2.7GHz-days of effort. If you do 5 extra levels of TF, to 2^73, you get 7.14% chance of factor for 35GHz-days effort, compared to the equivalent P-1 which takes only 5.6GHz-days of effort. So if you're talking about doing more TF vs P-1 on the same CPU, it's clear that P-1 is the winner (6x+ faster for same probability). However, it is entirely possible that many systems have GPUs that can pump out TF >6x faster than the CPU, so despite the apparent inefficiency, it could still be faster to get the same probability of factor doing TF than P-1. Not to mention that TF requires almost no RAM, whereas P-1 has extensive RAM requirements. |
|
|
|
|
|
#423 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
827910 Posts |
GPU trial factorers are very welcome in the 50-60M range. If you polish off say 3 bits of TF, then P-1 will probably choose lower bounds, saving P-1 time. Plus, you'll find factors that some P-1'ers would have missed and even more factors that LL-testers-with-minimal-memory would have found. This will save LL testing time.
The calculations are horrific with weird feedback loops, so one cannot quantify any gains or say with any certainty what the optimal use of GPUs should be. However, I can say with certainty, the more GPUs, P-1ers, and LLers working below 60M the sooner we will finish this range and the sooner we will find the next Mersenne prime! |
|
|
|
|
|
#424 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
10000101101012 Posts |
Would it be possible to have a manual assignment type of "GPU TF", which you (George) can tweak to be whatever is most useful at the moment. For now, that would be handing out exponents that are normally in the P-1 queue, for TF to 2 bits higher than normal (for example). Would simplify the choice for GPU-TF'ers to know where to be most useful. And with those exponents officially "assigned", they won't be accidentally handed out for P-1 at the lower bounds (or worse, LL) before the TF results come back.
Last fiddled with by James Heinrich on 2011-02-08 at 02:36 |
|
|
|
|
|
#425 |
|
"Lucan"
Dec 2006
England
11001010010102 Posts |
I presume you refer to the 47+1.293th prime expected before 80M.
David Last fiddled with by davieddy on 2011-02-09 at 05:00 |
|
|
|
|
|
#426 | |
|
"Oliver"
Mar 2005
Germany
5×223 Posts |
Quote:
E.g. an exponent which should be ususually TFed to 2^67 before P-1 and to 2^68 after P-1 ==> TF it to 2^{69,70} before P-1. I would like to see assignments of this type including multiple bitlevel (up to the last bitlevel) at once! Oliver |
|
|
|
|
|
|
#427 | |
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22·3·641 Posts |
Quote:
OTOH, once GPUs are doing L-L, the GPU:TF/GPU:LL tradeoff point changes again, perhaps back to near the CPU:TF/CPU:LL tradeoff. Then there's the GPU:TF/CPU:P-1/GPU:LL tradeoff and the GPU:TF/GPU:P-1/CPU:LL tradeoff and the GPU:TF/GPU:P-1/GPU:LL tradeoff, plus the GPU:TF/CPU:P-1/GPU:TF/GPU:LL tradeoff and the GPU:TF/CPU:P-1/CPU:TF/GPU:LL tradeoff and the ... Hmmm... Last fiddled with by cheesehead on 2011-02-10 at 12:10 Reason: Once GPUs do both TF and L-L ... and P-1, ... |
|
|
|
|
|
|
#428 | |
|
"Vincent"
Apr 2010
Over the rainbow
1011011010002 Posts |
On my GTX 460, M52100981 got from 69 to 72 in a bit less than 6 hours.
Quote:
Last fiddled with by firejuggler on 2011-02-10 at 11:40 |
|
|
|
|
|
|
#429 | ||
|
"James Heinrich"
May 2004
ex-Northern Ontario
7×13×47 Posts |
Quote:
Quote:
For comparison, if you go for the top-end and get an i7-2600K overclocked to 4.5GHz, you'll get roughly 30GHz-days/day of LL/P-1 work across 4 cores, or 25GHz-days/day of TF work (above 2^66). That is a "good" video card (GTX460) vs a "very good" CPU (SB i7-2600K OC). Comparing a "good" CPU vs a "medium" GPU, let's compare my i7-920 @ 3.1GHz vs 8800GT: CPU: 14GHz-days/day across 4 cores LL/P-1; 15GHz-days/day for TF GPU: my current assignment of M332222641 from 2^76-77 will give 46GHz-days in 71 hours, so around 15.5GHz-days/day. Remember, of course, that GPU-TF (using mfaktc) still chews up 1-2 cores of CPU time as well (faster card, more CPU required). Nevertheless, on my lower-end GPU it's still a 4x TF performance increase (one CPU core TF = 3.8GHzd/d; one CPU core + GPU TF = 15.5GHzd/d). On the high-end comparison, two CPU core TF = 12.5GHzd/d; two CPU core + GPU TF = 64GHzd/d for a 5x throughput increase, despite using an extra CPU core. (You could argue those cores are better at FFT, and might only be a 4x increase). For a final comparison, let's compare a lower-end CPU with a higher-end GPU, and a top-end CPU with a lower-end GPU. a) Let's first use firejuggler's GTX460 again with an i3-530 @ stock (3GHz): 6GHzd/d in LL/P-1; 7GHzd/d in TF. Both cores will be used feeding the GPU, but you get 64GHzd/d TF throughput for it, so a 9x-10x increase. b) i7-2600K + 8800GT: you trade 7.3GHzd/d per CPU core of LL work for 15.5GHzd/d GPU-TF work. On a system level that's 30GHzd/d vs 22+15 so only a 1.25x overall throughput increase. So from my limited sample of examined benchmarks, it can vary widely, from break-even (fast CPU, slow GPU) to 10x throughput increase (fast GPU, slow CPU). |
||
|
|
|