![]() |
[QUOTE=bcp19;293990]Using exponents in the ranges he suggests will end up using different FFTs, which is why he said to do 25/50/75M exps as an example.[/QUOTE]
OK. He did say "exponents". I didn't completely connect. Thanks. |
[QUOTE=kladner;293994]OK. He did say "exponents". I didn't completely connect. Thanks.[/QUOTE]
Oh man. What a screw up. You know you're a geek when... :headdesk: |
[QUOTE=chalsall;293875]
However, something to start discussing... Once we have effectively cleared out the "wave", should we race ahead of it, or come back and start going to 73 below 58.52? [/QUOTE] [B]Race ahead is more profitable and time-efficient[/B] (I can argue why, but you know my reasoning already, I have said it many times). In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates). As I always advocating, GPU owners should do: - at DC front, do DCLL! - at LL front, do TF! - and generally, TF is more profitable per bit level as the exponent is higher (save two LL instead of only one, save some P-1 too, TF is faster per bitlevel as the exponent is higher, etc, etc, etc). edit: someone found a new factor during I was writing this post, hehe... Now, if I edited it, I will use the opportunity to argue (again) why TF-to-72/73 is still profitable at LL front, using the same statistic of GPU-to-72: we found 1483 factors, using for them 785k GD, that is a bit less then 530 GD per factor. A top-end card like gtx580 get this with mfaktc in a day and half. Say 2 days, or say 3 or even 4 days for a totally lazy card. So, the same card should be able to finish 2 ([B]TWO[/B]) LL tests for a 45M exponent, AND some P-1 in the same time. That would be about 50 hours for ONE LL test in 45M, which is not yes possible even with the best card. What shows that doing TF is still more profitable at this level, for exponents with NO LL done. If one LL test is done, the question become arguable. Some good cards could do the LL in 60-80 hours, which [U]could[/U] be faster then finding a factor. But the things get MUCH worse for LL as the exponent raises, and they get MUCH better for TF (at the same bit level), which is faster as the exponent raises, and has about the same chance of finding factors per time unit spent. |
Chalsall, idea: We/You released some higher up expos awaiting P-1 on the theory that we can always come back to them later when our queue is shorter. But, if we still fill up on other expos of the same size, then we won't have room in the q to grab lower expos. Therefore, how about Spidey grabs any expos already at 72 without P-1 that are lower than any in our q, up to a limit of (say) 1000 (keeping in mind that that's not even two weeks of work for our current throughput). How hard would that be?
|
[QUOTE=flashjh;293932]Why 1.3?[/QUOTE]
Because when I tried compiling newer versions, I had problems. Do you think v.1.69 can be used on a board with cc 1.3? I'll try and let you know... Here are the results: v1.69 [code] Iteration 6000 M( 45009487 )C, 0x8cd5213b34f29c69, n = 2621440, CUDALucas v1.69 err = 0.0625 (0:49 real, 48.9579 ms/iter, ETA 612:00:50) [/code] v1.3 [code] Iteration 27840000 23.4 msec/Iter M( 45009487 )C, 0x0b164fcde8cd4925, n = 4194304, CUDALucas v1.3 [/code] Luigi |
[QUOTE=James Heinrich;293952]I'd like to put together a CUAlucas performance comparison chart[/QUOTE]Thanks to those who have submitted data, but I need more data points, please. :smile:
After looking over a few benchmark results, I'm going to standardize and ask that everyone submit results using v1.69 on three specific exponents:[code]CUDAlucas -polite 0 26214400 CUDAlucas -polite 0 52428800 CUDAlucas -polite 0 78643200[/code]And (important), I need to know what FFT size was used. You may see it start with a smaller FFT size at first and then move up if the error is too high:[quote]C:\Prime95\cudalucas>CUDALucas_169_20 -polite 0 26214400 [color=red]start M26214400 fft length = 1310720 iteration = 22 < 1000 && err = 0.26196 >= 0.25, increasing n from 1310720[/color] [color=blue]start M26214400 fft length = 1572864[/color] [color=gray]Iteration 10000 M( 26214400 )C, 0x0344448e4bf0eb62, n = 1572864, CUDALucas v1.69 err = 0.02403 (0:31 real, 3.0623 ms/iter, ETA 22:17:12)[/color] [b]Iteration 20000[/b] M( 26214400 )C, 0x9f4a57b1f324d325, n = 1572864, CUDALucas v1.69 err = 0.02403 (0:30 real, [b]3.0247 ms/iter[/b], ETA 22:00:15)[/quote]For consistency, I'm using the timing data as reported on iteration 20000. So for anyone willing to run (or re-run) benchmark data for me, please: * use v1.69 ([url=http://www.mersenneforum.org/showpost.php?p=293735&postcount=1062]Windows binaries here[/url]) * use the exact 3 commandlines above * send me the output from start to 20000 iteration (as the above example). |
[QUOTE=Dubslow;294012]Therefore, how about Spidey grabs any expos already at 72 without P-1 that are lower than any in our q, up to a limit of (say) 1000 (keeping in mind that that's not even two weeks of work for our current throughput). How hard would that be?[/QUOTE]
Trivial. Which is why Spidy already does exactly that... :smile: The queue is currently 800 candidates in size. Those which are released when we have more than 800 not assigned are the highest. |
[QUOTE=LaurV;294002][B]Race ahead is more profitable and time-efficient[/B] (I can argue why, but you know my reasoning already, I have said it many times).
In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates). [/QUOTE] You're combining apples and oranges in your statement, your figures have DC exponents that have been factored as high as ^72. While you are correct in that the cost per factor is higher than your 120GHzDays, it is not as high as you have listed. Plus, the higher the exponent gets, the more GHzDays it will take the LL to run and fewer GHzDays the TF will take. Example: a 25M exp is 21.42GHzD, a 30M exp is 31.25 and a 35M is 43.75. So a blunt "you have to find a factor in 120GHzDays" only applies up to a certain exponent level. Using your logic and 25M as a basis, a 30M exp will take 50% more GHzD, so at 180GHzD per factor it is profitable to do a TF and at 35M, 240GHzD per factor it is profitable. |
[QUOTE=LaurV;294002]In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates).[/QUOTE]
I keep coming back to this and it keeps bothering me, and I think I finally figured out why. The numbers were not making sense. First, the 'profitability of doing TF vs DC'. In order for TF to be profitable, you need to be able, on average, to find a factor in the same or less time than it takes to perform a DC, correct? So, I have tested 4 of my GPUs so far, and using the information from JamesH's site for work/day, I come up with between 207 and 220 GD of mfaktc work in the same amount of time each could perform a DC. So where did the 120 come from? The 580 is listed at 316.2 GD, Between the quote above and looking at other threads, I've come up with between 18 and 19.5 hours to complete a DC on a 580. If we just go with 18, then .75*316.2 = 237.15. With the 195GD quoted above, this falls quite nicely under the 237GD a DC would take. I have to ask LaurV, did you take 1/2 of this thinking there were 2 tests needing to be saved instead of the one DC? |
[QUOTE=chalsall;294031]Trivial. Which is why Spidy already does exactly that... :smile:
The queue is currently 800 candidates in size. Those which are released when we have more than 800 not assigned are the highest.[/QUOTE] Can I vote for 1000? 800 is only 10 days' work, and 1000 is such a nice number... :smile: (I did mean reclaim lower expos from PrimeNet that we had previously TFd that are lower than those currently in the queue... I can't tell whether or not you understood that :razz: (The first part makes me think you did, but the second part makes me think you didn't)) |
[QUOTE=Dubslow;294083]Can I vote for 1000? 800 is only 10 days' work, and 1000 is such a nice number... :smile:[/QUOTE]
OK. 1000 it is. [QUOTE=Dubslow;294083](I did mean reclaim lower expos from PrimeNet that we had previously TFd that are lower than those currently in the queue... I can't tell whether or not you understood that :razz: (The first part makes me think you did, but the second part makes me think you didn't))[/QUOTE] Yes, I understood what you meant. And that's how the system works. Even something we've previously TFed to 72 and then released, or someone else TFed to 72, if Spidy sees something "interesting" it will grab it back if possible. In this case, that means something below the maximum of what we currently hold which needs P-1 work. |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.