![]() |
On a completely different topic...
I monitor the LMH progress and indications are that in the last few months its progress has slowed noticeably...I suspect that is a direct result of more and more people getting GPUs and working on other ranges (i.e. GPUto72), along with people getting the message that with GPUs being so much more efficient at TF that some without GPUs and moving to other work.
|
My 2 cents worth...
[QUOTE=chalsall;293875]You are.
However, something to start discussing... Once we have effectively cleared out the "wave", should we race ahead of it, or come back and start going to 73 below 58.52? Probably we'll do what we're now doing in the DC range -- some are having fun taking higher ranges from 67 to 68, while "Pete" is taking many candidates below 29.69 to 70 rather than the nominal 69.[/QUOTE] Take into account that I do NOT have a GPU (yet)... Continue to help LL and DC progress...that is continue to go to higher bits in the LL wave (which I might extent to 62M) including those LL tests that get released... And similarly in the DC wave. |
[QUOTE=petrw1;293886]I monitor the LMH progress and indications are that in the last few months its progress has slowed noticeably...I suspect that is a direct result of more and more people getting GPUs and working on other ranges (i.e. GPUto72), along with people getting the message that with GPUs being so much more efficient at TF that some without GPUs and moving to other work.[/QUOTE]
The wave has slowed because it is 64 to 66 that is being done not the previous 64 to 65. |
[QUOTE=chalsall;293875]
However, something to start discussing... Once we have effectively cleared out the "wave", should we race ahead of it, or come back and start going to 73 below 58.52? [/QUOTE] CUDALucas. While the extra TF would be nice, clearing all expos (not just the factorables) is the project's main goal. (Of course, I think people would prefer to have a nice, really stable version of CuLu that just plain works, while a few people continue with development versions.) |
[QUOTE=gjmccrac;293890]The wave has slowed because it is 64 to 66 that is being done not the previous 64 to 65.[/QUOTE]
Which would take 3 times as long as 64-65 but I believe it has slowed more than that. Based on the rate it processed the 500 and 600M range it should have taken 3 weeks to complete 800M to 65 bits [url]http://www.mersenneforum.org/showpost.php?p=288475&postcount=377[/url] 9 weeks to 66 bits. 800M started on Feb 6...9 weeks takes us to April 9th. Currently it is processing just over 10,000 per day. 339,000 left = 33 more days = April 23. Hmmm....maybe within the tolerance range of my guess-timates. |
[QUOTE=bcp19;293876]Edit: Would it be difficult to make the site do this automatically? I normally select Lowest exponent, since there is a lowest to 70 selection, how about the program ignore the factor to box on lowest exponent and give out assignments below the cutoff to 69 and above to 70?[/QUOTE]
Done. Now, unless you choose the "Lower Exponents to 70" option, any candidate below 29.69M will be issued with a "to 69" worktodo line even if you've "pledged" to go to 70. |
Can the same be done for LLTF at 58.xx as well?
|
[QUOTE=axn;293881]I suspect that the answer would turn out to be "both" :smile: Build up a buffer ahead of the wave while simultaneously speeding up the wave.[/QUOTE]
Yup... I agree. Exactly like what's happening now in the DC range. [QUOTE=axn;293881]Maybe even prep some EFF prize candidates (~332M range)?[/QUOTE] The system could very easily facilitate that -- literlly less than half an hour of work. Anyone interested? I would break the "Work Saved" statistics out, since a single factor found would save about 5,000 GHz Days of LL work... |
[QUOTE=Dubslow;293899]Can the same be done for LLTF at 58.xx as well?[/QUOTE]
Already done, in so far as if you choose "What makes sense" and pledge to go to 73, the system will only issue work above 58.52M. What I have done for the DCTF form will only be needed for the LLTF form once there are few available candidates below 58.52M. |
[QUOTE=chalsall;293902]The system could very easily facilitate that <332M> -- literlly less than half an hour of work. Anyone interested?
I would break the "Work Saved" statistics out, since a single factor found would save about 5,000 GHz Days of LL work...[/QUOTE] Personally, I think Uncwilly is doing a good job managing that range... [url]http://www.mersenneforum.org/showthread.php?t=10693[/url] and I know several of the participants there have GPUs. That being said, he probably won't object to help. |
[QUOTE=Dubslow;293895]CUDALucas. While the extra TF would be nice, clearing all expos (not just the factorables) is the project's main goal. (Of course, I think people would prefer to have a nice, really stable version of CuLu that just plain works, while a few people continue with development versions.)[/QUOTE]
Agreed. However, I would really like to see a proper analysis done on where exactly the "curves cross" as far as how much TFing can a GPU do before it takes longer to find a factor than to run a LL test. I suspect, based on for example, [URL="http://www.gpu72.com/reports/worker/2423ae6e8f696d5e7d1447de91ca35a6/"]LaurV's[/URL] statistics, that it would actually be "profitable" to go at least one, perhaps two, additional bits, but it would be nice to have hard data rather than a gut feel. James, I know you were collecting such data for GPU TFing -- have you also collected data on GPU LLing? |
[QUOTE=petrw1;293904]Personally, I think Uncwilly is doing a good job managing that range...
[url]http://www.mersenneforum.org/showthread.php?t=10693[/url] and I know several of the participants there have GPUs. That being said, he probably won't object to help.[/QUOTE] They've talked about it [URL="http://www.mersenneforum.org/showthread.php?p=280003#post280003"]before[/URL], and like Uncwilly said, there's no point at the moment. GPU272 was conceived to make sure that expos didn't get handed out as LLs. What difference would it make? (Wait, pretty graphs! Of course!) [QUOTE=chalsall;293906]Agreed. However, I would really like to see a proper analysis done on where exactly the "curves cross" as far as how much TFing can a GPU do before it takes longer to find a factor than to run a LL test. I suspect, based on for example, [URL="http://www.gpu72.com/reports/worker/2423ae6e8f696d5e7d1447de91ca35a6/"]LaurV's[/URL] statistics, that it would actually be "profitable" to go at least one, perhaps two, additional bits, but it would be nice to have hard data rather than a gut feel. James, I know you were collecting such data for GPU TFing -- have you also collected data on GPU LLing?[/QUOTE] For LaurV, at least, he's been running expos through CuLu, and stopping and restarting both as he tweaks hardware settings and as new versions of CuLu come out. Keep in mind that the last month or so I would expect that our CuLu output has been quite a bit lower than otherwise. It went through a period of hard testing when no one was really sure what versions were stable or not, and LaurV and flash (at least) were mostly focused on testing rather than throughput. Fortunately, CuLu seems to have solidified recently; it comes back to what I said earlier about having an absolutely stable version for the majority to use. (I think that might be a few versions down the road, but those reasons belong in the CuLu thread, not here.) |
[QUOTE=Dubslow;293908]They've talked about it [URL="http://www.mersenneforum.org/showthread.php?p=280003#post280003"]before[/URL], and like Uncwilly said, there's no point at the moment. GPU272 was conceived to make sure that expos didn't get handed out as LLs. What difference would it make?[/QUOTE]
Hey, that's cool. Trivial for me to do, but I'm not wishing to step on anyone's toes. |
[QUOTE=chalsall;293906]have you also collected data on GPU LLing?[/QUOTE]No, I haven't.
I've run exactly one exponent through CUDAlucas. I found it to be highly version-dependent: v1.64 was roughly 100% faster than v1.2 on my 570 (~3ms/it vs ~6ms/it at somewhere around 26M). So for a DC candidate in that range, it takes about 22h to LL so you should spend about 20 minutes (3.9GHz-days) on TF, which is [url=http://mersenne-aries.sili.net/credit.php?worktype=TF&exponent=26000000&frombits=1&tobits=69]roughly 2^69[/url]. |
[QUOTE=James Heinrich;293912]No, I haven't.[/QUOTE]
Could you? [QUOTE=James Heinrich;293912]I've run exactly one exponent through CUDAlucas. I found it to be highly version-dependent: v1.64 was roughly 100% faster than v1.2 on my 570 (~3ms/it vs ~6ms/it at somewhere around 26M). So for a DC candidate in that range, it takes about 22h to LL so you should spend about 20 minutes (3.9GHz-days) on TF, which is [url=http://mersenne-aries.sili.net/credit.php?worktype=TF&exponent=26000000&frombits=1&tobits=69]roughly 2^69[/url].[/QUOTE] Good to know. But I would argue the "curves" change for each 1M range. It would be nice to start collecting statistics on exponent, wall-clock time, hardware and SW version. |
[QUOTE=chalsall;293911]Hey, that's cool. Trivial for me to do, but I'm not wishing to step on anyone's toes.[/QUOTE]
Now that you're willing, I think the shinies might persuade him :wink: (we really need a better smiley for wink) [QUOTE=chalsall;293914] Good to know. But I would argue the "curves" change for each 1M range. It would be nice to start collecting statistics on exponent, wall-clock time, hardware and SW version.[/QUOTE] I would reargue that any serious data analysis should wait until the dev changes slow down. I think they have, but we should wait a week or so to be sure. (James, among the changes, somewhere in the 1.6 series, Prime95 came along and made some optimizations, 7% improvement right off the bat :smile:) |
[QUOTE=chalsall;293790][URL="http://www.gpu72.com/reports/estimated_completion/primenet/"]View added.[/URL] Answer: less than 200 days to Trial Factor everything to the new GPU levels up to 60M for every candidate not already LLed.
Wow!!! GPUs are just amazing!!! :smile:[/QUOTE] Any chance we can get that chart for P-1 factoring as well? I'm kinda curious what our throughput is. (It would also be nice to know PrimeNet's throughput in the area that petrw1 previously estimated.) |
[QUOTE=Dubslow;293919]Any chance we can get that chart for P-1 factoring as well? I'm kinda curious what our throughput is. (It would also be nice to know PrimeNet's throughput in the area that petrw1 previously estimated.)[/QUOTE]
I knew you were going to ask... :smile: Yes, that's already planned for in the data definitions and code, just not done yet. Unfortunately, calculating PrimeNet's P-1 throughput is not as easy, but I might have thought of a way to do so. |
[QUOTE=chalsall;293920]I knew you were going to ask... :smile: Yes, that's already planned for in the data definitions and code, just not done yet.
[/quote]Ah, but I think you're just as curious as me/the rest of us, and so asking was only a formality ;) [QUOTE=chalsall;293920]Unfortunately, calculating PrimeNet's P-1 throughput is not as easy, but I might have thought of a way to do so.[/QUOTE] I wouldn't have asked if you didn't already have mersenne.info running... :razz: |
[QUOTE=chalsall;293914]Could you?
Good to know. But I would argue the "curves" change for each 1M range. It would be nice to start collecting statistics on exponent, wall-clock time, hardware and SW version.[/QUOTE] CuLu has come a long way. The assessment about the trial period is correct. We had a lot of testing for a while. Newer versions are stable and fast. Original DC took ~24 hours, now with 1.69 I can run a DC in ~15 hours. That's without specific code optimizations for CUDA and shader model. As a side, I'm willing to do TF in the 332M range one we're caught up. |
[QUOTE=flashjh;293926]CuLu has come a long way. The assessment about the trial period is correct. We had a lot of testing for a while. Newer versions are stable and fast. Original DC took ~24 hours, now with 1.69 I can run a DC in ~15 hours. That's without specific code optimizations for CUDA and shader model.
As a side, I'm willing to do TF in the 332M range one we're caught up.[/QUOTE] Too bad that my GTX285 must stay with version 1.3... Luigi |
[QUOTE=ET_;293931]Too bad that my GTX285 must stay with version 1.3...
Luigi[/QUOTE] Why 1.3? |
[QUOTE=chalsall;293906]However, I would really like to see a proper analysis done on where exactly the "curves cross" as far as how much TFing can a GPU do before it takes longer to find a factor than to run a LL test. I suspect, based on for example, [URL="http://www.gpu72.com/reports/worker/2423ae6e8f696d5e7d1447de91ca35a6/"]LaurV's[/URL] statistics, that it would actually be "profitable" to go at least one, perhaps two, additional bits, but it would be nice to have hard data rather than a gut feel.[/QUOTE]
This will be different for pre-Kepler vs Kepler. Most likely, Kepler will end up doing 1 more bit compared to the previous gens, since the relative DP/integer performance is much worse than others, thus favoring mfaktc over CUDALucas even more. |
[QUOTE=chalsall;293914]Could you?[/QUOTE]I could. I'll need a wider variety of data samples than the single one I have. If anyone reading this thread could fire up CUDAlucas and PM/email me some iteration times for a variety of exponent sizes (at least 25M, 50M and 75M would be great). If possible, a variety of CUDAlucas versions would also be interesting. Naturally I'd also need to know what GPU you're using (and at what clock speed, if overclocked (whether factory or by yourself)).
|
[QUOTE=James Heinrich;293951]I could. I'll need a wider variety of data samples than the single one I have. If anyone reading this thread could fire up CUDAlucas and PM/email me some iteration times for a variety of exponent sizes (at least 25M, 50M and 75M would be great). If possible, a variety of CUDAlucas versions would also be interesting. Naturally I'd also need to know what GPU you're using (and at what clock speed, if overclocked (whether factory or by yourself)).[/QUOTE]
I will arrange to get some test results, with details.....in a bit. I just reconfigured my balance between CPU & GPU. I now have 4x P-1 on the CPU, and 1 each mfaktc and CuLu 1.69 on the GTX 460. I'm still observing that change. Once I have some idea how this is working, I'll do a run with CuLu sans mfaktc. @James H., if you want other tests, just say so. |
[QUOTE=kladner;293974]Once I have some idea how this is working, I'll do a run with CuLu sans mfaktc[/QUOTE]I don't need anything fancy. Really, just a 60-second test run of each exponent range would be fine, just give me the average per-iteration timing. No need to do a full run, I just need timings for assorted FFT sizes (25M, 50M, 75M should give me a good starting point).
|
Got it. I'll work on it soon.
|
Why such large sizes?
|
[QUOTE=James Heinrich;293976]..... I just need timings for assorted FFT sizes (25M, 50M, 75M should give me a good starting point).[/QUOTE]
Sorry. On reflection, I'm not sure now how to proceed. I've been studying, and experimenting with FFT sizes, but I don't really understand what to feed to CuLu to get the results you want. Does this correspond to exponent sizes in some way? |
For FFT length, just type in 25*2^30, 50*2^30, etc... in full (decimal) form (Wolfram Alpha or google them). (I believe you specify length with -f. You can also just use -r to automatically test a whole bunch of sizes.)
|
[QUOTE=kladner;293982]Sorry. On reflection, I'm not sure now how to proceed. I've been studying, and experimenting with FFT sizes, but I don't really understand what to feed to CuLu to get the results you want. Does this correspond to exponent sizes in some way?[/QUOTE]
Using exponents in the ranges he suggests will end up using different FFTs, which is why he said to do 25/50/75M exps as an example. |
[QUOTE=bcp19;293990]Using exponents in the ranges he suggests will end up using different FFTs, which is why he said to do 25/50/75M exps as an example.[/QUOTE]
OK. He did say "exponents". I didn't completely connect. Thanks. |
[QUOTE=kladner;293994]OK. He did say "exponents". I didn't completely connect. Thanks.[/QUOTE]
Oh man. What a screw up. You know you're a geek when... :headdesk: |
[QUOTE=chalsall;293875]
However, something to start discussing... Once we have effectively cleared out the "wave", should we race ahead of it, or come back and start going to 73 below 58.52? [/QUOTE] [B]Race ahead is more profitable and time-efficient[/B] (I can argue why, but you know my reasoning already, I have said it many times). In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates). As I always advocating, GPU owners should do: - at DC front, do DCLL! - at LL front, do TF! - and generally, TF is more profitable per bit level as the exponent is higher (save two LL instead of only one, save some P-1 too, TF is faster per bitlevel as the exponent is higher, etc, etc, etc). edit: someone found a new factor during I was writing this post, hehe... Now, if I edited it, I will use the opportunity to argue (again) why TF-to-72/73 is still profitable at LL front, using the same statistic of GPU-to-72: we found 1483 factors, using for them 785k GD, that is a bit less then 530 GD per factor. A top-end card like gtx580 get this with mfaktc in a day and half. Say 2 days, or say 3 or even 4 days for a totally lazy card. So, the same card should be able to finish 2 ([B]TWO[/B]) LL tests for a 45M exponent, AND some P-1 in the same time. That would be about 50 hours for ONE LL test in 45M, which is not yes possible even with the best card. What shows that doing TF is still more profitable at this level, for exponents with NO LL done. If one LL test is done, the question become arguable. Some good cards could do the LL in 60-80 hours, which [U]could[/U] be faster then finding a factor. But the things get MUCH worse for LL as the exponent raises, and they get MUCH better for TF (at the same bit level), which is faster as the exponent raises, and has about the same chance of finding factors per time unit spent. |
Chalsall, idea: We/You released some higher up expos awaiting P-1 on the theory that we can always come back to them later when our queue is shorter. But, if we still fill up on other expos of the same size, then we won't have room in the q to grab lower expos. Therefore, how about Spidey grabs any expos already at 72 without P-1 that are lower than any in our q, up to a limit of (say) 1000 (keeping in mind that that's not even two weeks of work for our current throughput). How hard would that be?
|
[QUOTE=flashjh;293932]Why 1.3?[/QUOTE]
Because when I tried compiling newer versions, I had problems. Do you think v.1.69 can be used on a board with cc 1.3? I'll try and let you know... Here are the results: v1.69 [code] Iteration 6000 M( 45009487 )C, 0x8cd5213b34f29c69, n = 2621440, CUDALucas v1.69 err = 0.0625 (0:49 real, 48.9579 ms/iter, ETA 612:00:50) [/code] v1.3 [code] Iteration 27840000 23.4 msec/Iter M( 45009487 )C, 0x0b164fcde8cd4925, n = 4194304, CUDALucas v1.3 [/code] Luigi |
[QUOTE=James Heinrich;293952]I'd like to put together a CUAlucas performance comparison chart[/QUOTE]Thanks to those who have submitted data, but I need more data points, please. :smile:
After looking over a few benchmark results, I'm going to standardize and ask that everyone submit results using v1.69 on three specific exponents:[code]CUDAlucas -polite 0 26214400 CUDAlucas -polite 0 52428800 CUDAlucas -polite 0 78643200[/code]And (important), I need to know what FFT size was used. You may see it start with a smaller FFT size at first and then move up if the error is too high:[quote]C:\Prime95\cudalucas>CUDALucas_169_20 -polite 0 26214400 [color=red]start M26214400 fft length = 1310720 iteration = 22 < 1000 && err = 0.26196 >= 0.25, increasing n from 1310720[/color] [color=blue]start M26214400 fft length = 1572864[/color] [color=gray]Iteration 10000 M( 26214400 )C, 0x0344448e4bf0eb62, n = 1572864, CUDALucas v1.69 err = 0.02403 (0:31 real, 3.0623 ms/iter, ETA 22:17:12)[/color] [b]Iteration 20000[/b] M( 26214400 )C, 0x9f4a57b1f324d325, n = 1572864, CUDALucas v1.69 err = 0.02403 (0:30 real, [b]3.0247 ms/iter[/b], ETA 22:00:15)[/quote]For consistency, I'm using the timing data as reported on iteration 20000. So for anyone willing to run (or re-run) benchmark data for me, please: * use v1.69 ([url=http://www.mersenneforum.org/showpost.php?p=293735&postcount=1062]Windows binaries here[/url]) * use the exact 3 commandlines above * send me the output from start to 20000 iteration (as the above example). |
[QUOTE=Dubslow;294012]Therefore, how about Spidey grabs any expos already at 72 without P-1 that are lower than any in our q, up to a limit of (say) 1000 (keeping in mind that that's not even two weeks of work for our current throughput). How hard would that be?[/QUOTE]
Trivial. Which is why Spidy already does exactly that... :smile: The queue is currently 800 candidates in size. Those which are released when we have more than 800 not assigned are the highest. |
[QUOTE=LaurV;294002][B]Race ahead is more profitable and time-efficient[/B] (I can argue why, but you know my reasoning already, I have said it many times).
In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates). [/QUOTE] You're combining apples and oranges in your statement, your figures have DC exponents that have been factored as high as ^72. While you are correct in that the cost per factor is higher than your 120GHzDays, it is not as high as you have listed. Plus, the higher the exponent gets, the more GHzDays it will take the LL to run and fewer GHzDays the TF will take. Example: a 25M exp is 21.42GHzD, a 30M exp is 31.25 and a 35M is 43.75. So a blunt "you have to find a factor in 120GHzDays" only applies up to a certain exponent level. Using your logic and 25M as a basis, a 30M exp will take 50% more GHzD, so at 180GHzD per factor it is profitable to do a TF and at 35M, 240GHzD per factor it is profitable. |
[QUOTE=LaurV;294002]In fact, with the new CudaLucas, TF-ing at DC front makes NO SENSE AT ALL already. One DC takes now 18 hours or less, depending on your card, with a bit of FFT length and threads tuning for each expo. For TF-ing to 69 bits to be more profitable, you have to find a factor every 120 GHzDays (GD) for an average card. GPU-2-72 found [strike]1071[/strike] [B]1072 [/B]DC factors [B]since its conception[/B], and spent [B]208K GD[/B] for them, that is roughly [B]195 GD per factor[/B], much [U][B]UNDER[/B][/U] the profitable floating line (assuming we would have had the actual fast CL at that time we started TF for DC candidates).[/QUOTE]
I keep coming back to this and it keeps bothering me, and I think I finally figured out why. The numbers were not making sense. First, the 'profitability of doing TF vs DC'. In order for TF to be profitable, you need to be able, on average, to find a factor in the same or less time than it takes to perform a DC, correct? So, I have tested 4 of my GPUs so far, and using the information from JamesH's site for work/day, I come up with between 207 and 220 GD of mfaktc work in the same amount of time each could perform a DC. So where did the 120 come from? The 580 is listed at 316.2 GD, Between the quote above and looking at other threads, I've come up with between 18 and 19.5 hours to complete a DC on a 580. If we just go with 18, then .75*316.2 = 237.15. With the 195GD quoted above, this falls quite nicely under the 237GD a DC would take. I have to ask LaurV, did you take 1/2 of this thinking there were 2 tests needing to be saved instead of the one DC? |
[QUOTE=chalsall;294031]Trivial. Which is why Spidy already does exactly that... :smile:
The queue is currently 800 candidates in size. Those which are released when we have more than 800 not assigned are the highest.[/QUOTE] Can I vote for 1000? 800 is only 10 days' work, and 1000 is such a nice number... :smile: (I did mean reclaim lower expos from PrimeNet that we had previously TFd that are lower than those currently in the queue... I can't tell whether or not you understood that :razz: (The first part makes me think you did, but the second part makes me think you didn't)) |
[QUOTE=Dubslow;294083]Can I vote for 1000? 800 is only 10 days' work, and 1000 is such a nice number... :smile:[/QUOTE]
OK. 1000 it is. [QUOTE=Dubslow;294083](I did mean reclaim lower expos from PrimeNet that we had previously TFd that are lower than those currently in the queue... I can't tell whether or not you understood that :razz: (The first part makes me think you did, but the second part makes me think you didn't))[/QUOTE] Yes, I understood what you meant. And that's how the system works. Even something we've previously TFed to 72 and then released, or someone else TFed to 72, if Spidy sees something "interesting" it will grab it back if possible. In this case, that means something below the maximum of what we currently hold which needs P-1 work. |
Okay, just had to be sure :razz:
Now I'm sure. Thanks |
chalsall, can you put the "minimum TF level" back for P-1? I just tried to snag the lone 74-bit but couldn't :razz:
|
[QUOTE=Dubslow;294180]chalsall, can you put the "minimum TF level" back for P-1? I just tried to snag the lone 74-bit but couldn't :razz:[/QUOTE]
Instead I've added a "FactTo desc" to the "order by" clause. Try now.... :smile: |
chalsall, the LLTF function appears broken. I tried reserving under 51M, and about half of the assignments were >52M, with one in the 57M range. There are more available under 51M, but currently at a higher bit level.
|
[QUOTE=Dubslow;294225]chalsall, the LLTF function appears broken. I tried reserving under 51M, and about half of the assignments were >52M, with one in the 57M range. There are more available under 51M, but currently at a higher bit level.[/QUOTE]
What "Option" did you choose? If it was the default of "What makes sense" and your pledge was to 72, then you would have received the "Oldest". I sometimes change "What makes sense" to be what makes sense at the time. There are some old candidates reserved from PrimeNet coming back from P-1 workers who took work at below 72 which I wanted to clear out. The other options, "Lowest Exponent", etc, always do what they say they'll do. |
[QUOTE=Dubslow;294180]chalsall, can you put the "minimum TF level" back for P-1? I just tried to snag the lone 74-bit but couldn't :razz:[/QUOTE]
I believe if you fill in "Optional range" of 57M to 57M and ask for at least 5 you should get that one. |
2 more GPUto72 milestones.....
Over 3,000 factors found.
Over 10,000 P-1 completions |
[QUOTE=James Heinrich;294026]Thanks to those who have submitted data, but I need more data points, please. :smile:[/QUOTE]Thanks everyone who has submitted data, I now have a pretty good picture of CUDALucas throughput, and can current predict timings +/- 6% for pretty much any card.
[url]http://mersenne-aries.sili.net/cudalucas.php[/url] Performance depends on GFLOPS, FFT size, and compute version. As with mfaktc, compute 2.0 is best, 2.1 is second-best and 1.3 is slowest. |
[QUOTE=James Heinrich;294257]Thanks everyone who has submitted data, I now have a pretty good picture of CUDALucas throughput, and can current predict timings +/- 6% for pretty much any card.[/QUOTE]
Coolness. Thanks James!!! Now, the question at hand... Where do the curves cross to guide people as to how deep should they TF at a particular candidate level vs. doing a LL run? |
[QUOTE=chalsall;294261]Coolness. Thanks James!!!
Now, the question at hand... Where do the curves cross to guide people as to how deep should they TF at a particular candidate level vs. doing a LL run?[/QUOTE] Eyeballing the stats (LL vs TF), I see the TF earning about 9x the credit/day vs LL. Assuming CPUs earn roughly the same credit/day for both LL & TF, this represents 3 additional bits of TF (7x additional effort in TF). No data for 680 -- I suspect it'll be closer to 4 additional bits. But what about those cards that can't LL?! |
[QUOTE=axn;294282]Eyeballing the stats (LL vs TF), I see the TF earning about 9x the credit/day vs LL. Assuming CPUs earn roughly the same credit/day for both LL & TF, this represents 3 additional bits of TF (7x additional effort in TF). No data for 680 -- I suspect it'll be closer to 4 additional bits.
But what about those cards that can't LL?![/QUOTE] I have a feeling very few of those cards are being used by Gimpsters, seeing as they are GTS 2/3xx, GT 2/3xx and 8/9xxx models. Close to half of them produce less per day than a single core of an I5 2500k. |
[QUOTE=chalsall;294261]Now, the question at hand... Where do the curves cross to guide people as to how deep should they TF at a particular candidate level vs. doing a LL run?[/QUOTE]That's my next project. :smile:
|
[QUOTE=axn;294282]Eyeballing the stats (LL vs TF), I see the TF earning about 9x the credit/day vs LL. Assuming CPUs earn roughly the same credit/day for both LL & TF, this represents 3 additional bits of TF (7x additional effort in TF). No data for 680 -- I suspect it'll be closer to 4 additional bits.[/QUOTE]
Hmmm... Since DC-TF is saving only 1 LL test, that means DC-TF should do only 2 additional bits. Here are the default values, for reference. [code] 23390000 65 29690000 66 37800000 67 47450000 68 58520000 69 [/code] [QUOTE=bcp19;294292]I have a feeling very few of those cards are being used by Gimpsters, seeing as they are GTS 2/3xx, GT 2/3xx and 8/9xxx models. Close to half of them produce less per day than a single core of an I5 2500k.[/QUOTE] You're probably right -- nonetheless, it is an interesting question. There is no crossing of the lines, since there is only one line. That suggest an infinite number of additional bits :smile: |
[QUOTE=axn;294298]Hmmm... Since DC-TF is saving only 1 LL test, that means DC-TF should do only 2 additional bits.
[code] 23390000 65 29690000 66 37800000 67 47450000 68 58520000 69 [/code] You're probably right -- nonetheless, it is an interesting question. There is no crossing of the lines, since there is only one line. That suggest an infinite number of additional bits :smile:[/QUOTE] why not use: [url]http://mersenne-aries.sili.net/throughput.php[/url] to create another line with CPU data ? |
[QUOTE=chalsall;294246]What "Option" did you choose? If it was the default of "What makes sense" and your pledge was to 72, then you would have received the "Oldest".
I sometimes change "What makes sense" to be what makes sense at the time. There are some old candidates reserved from PrimeNet coming back from P-1 workers who took work at below 72 which I wanted to clear out. The other options, "Lowest Exponent", etc, always do what they say they'll do.[/QUOTE] It shouldn't matter what option I chose, if I specified the range 0-51000000, then I shouldn't get any expos >51M regardless of option, unless that's wrong? It hasn't appeared to be wrong before. |
[QUOTE=chalsall;293785]We currently hold about 42 days of LL Trial Factoring work.[/QUOTE]
[QUOTE=chalsall;293785] less than 200 days to Trial Factor everything to the new GPU levels up to 60M for every candidate not already LLed.[/QUOTE] We shouldn't be losing ground should we? unless LL tests are being turned in quicker than we can eliminate candidates. Today GPU=59.5 and PrimeNet=226 |
[QUOTE=petrw1;294387]We shouldn't be losing ground should we? unless LL tests are being turned in quicker than we can eliminate candidates.
Today GPU=59.5 and PrimeNet=226[/QUOTE] Nope. What you're seeing here is two fold... 1. A very large batch of Xyzzy's work scrolled off the 30 day window. 2. Spidy is no longer throwing back candidates it finds at above 55M at 71 bits. This is to ensure that PrimeNet only issues work to LL workers TFed to at least 72. We continue to release appoximately 2.5 times as many candidates as are being claimed by LL workers. And remember that the "less than 200 days to TF everything up to 60M" still stands -- the chart is showing to 61M because we hold some. Right now we could do everything up to 60M in 162 days. |
How far should we TF in the DC range?
Point for discussion.
Now that James has put together his excellent [URL="http://mersenne-aries.sili.net/cudalucas.php?model=13&granularity=1"]mfaktc vs CUDALucas[/URL] analysis, it shows that we actually shouldn't start TFing to 70 in the DC range until about 33.5M, rather than the 29.69M we were doing based on Prime95/mprime's cross-over point. So my question to everyone is, should we change the GPU72 cross-over point to be 33.5M instead, even though we are currently very far ahead of the wave-front? This question is largely directed to bcp19, as he's the main person still doing the "to completion" work in the DC range. But I would also like feedback from others who have a strong opinion one way or the other. Another option, of course is to nominally take things to 69 below 33.5, but keep some just ahead of the wavefront to be available to those who want to go to 70. Thoughts? |
I think the "system" should facilitate what is optimal for GIMPS. Individual user can easily override the system defaults (for those exponents spidy catches). Spidy should only pickup exponents based on the calculated (new) thresholds.
|
[QUOTE=chalsall;294566]Point for discussion.
Now that James has put together his excellent [URL="http://mersenne-aries.sili.net/cudalucas.php?model=13&granularity=1"]mfaktc vs CUDALucas[/URL] analysis, it shows that we actually shouldn't start TFing to 70 in the DC range until about 33.5M, rather than the 29.69M we were doing based on Prime95/mprime's cross-over point. So my question to everyone is, should we change the GPU72 cross-over point to be 33.5M instead, even though we are currently very far ahead of the wave-front? This question is largely directed to bcp19, as he's the main person still doing the "to completion" work in the DC range. But I would also like feedback from others who have a strong opinion one way or the other. Another option, of course is to nominally take things to 69 below 33.5, but keep some just ahead of the wavefront to be available to those who want to go to 70. Thoughts?[/QUOTE] [URL]http://mersenneforum.org/showpost.php?p=294572&postcount=1725[/URL] might answer your question. By my calculations, 2 of my systems are efficient at the current transition points and the other 3 become efficient within 3-4M. |
I think we'll all be happy to know that sometime in the last 24hrs or so, one of my P-1 cores that was running low picked up a 56M expo that was already at 72 bits from PrimeNet. The more I think about it, the better it is that we release candidates at 72 bits that we can't handle, 'cause now we're saving the "regulars" some work too :smile:
Edit: Now that I think about it, it's also probably partially due to the fact that Spidey is now keeping expos at 71, thereby removing those from the P-1 pool. Edit2: Can we have another list of Top 100 factors, except ranked by log2(factor)/exponent? That weights a 95 bit factor for a 48M expo higher than 95 bits for 55M expo. (You might need a scaling factor as well, so that the number isn't ridiculously small :razz:) |
[QUOTE=Dubslow;294718]
Edit2: Can we have another list of Top 100 factors, except ranked by log2(factor)/exponent? That weights a 95 bit factor for a 48M expo higher than 95 bits for 55M expo. (You might need a scaling factor as well, so that the number isn't ridiculously small :razz:)[/QUOTE] This could only make sense for TF, and not too much sense for P-1 or ECM. There is no factor in that list which was found by TF. We could have it sorted in any order we like, if the effort is not big to implement it. But I think the effort is high for the benefit we get. |
[QUOTE=LaurV;294756]This could only make sense for TF, and not too much sense for P-1 or ECM. There is no factor in that list which was found by TF. We could have it sorted in any order we like, if the effort is not big to implement it. But I think the effort is high for the benefit we get.[/QUOTE]
I never said it would be useful, just cool :razz::smile: |
New transition point for DCTF...
Just so everyone knows, based on the analysis by James, and bcp19 [URL="http://www.mersenneforum.org/showpost.php?p=294841&postcount=1742"]here[/URL], I have changed the DC transition from 69 to 70 bits from 29.69M to 32M.
All the reports, and of course the assignment page, have been updated accordingly. Also, while I'm "transmitting", bcp19 asked for his individual factoring statistics. I should have thought of doing that myself for everyone long ago; took three minutes. Everyone now has a new link under their "My Account" menu -- "[URL="http://www.gpu72.com/account/factoring_cost/"]Individual Factoring Cost[/URL]". Please let me know if anyone would like this data available to them in CSV format. |
[QUOTE=chalsall;294858]"[URL="http://www.gpu72.com/account/factoring_cost/"]Individual Factoring Cost[/URL]"[/QUOTE]Would it be easy to add a comparison to the total/average numbers for all users, so I can see how my cost-per-factor compares to overall?
|
[QUOTE=James Heinrich;294859]Would it be easy to add a comparison to the total/average numbers for all users, so I can see how my cost-per-factor compares to overall?[/QUOTE]
Not exactly easy, but yes, that's in the works. Also, I'm going to go ahead and do the CSV version of this, and will expose all useful data, including that which can't easily be made to fit on a web page. For each 1M range and bit level, I'll expose TF attempts, GHzDays Done, Factors Found, GHzDays Saved, and Expected Factors. With a spreadsheet (or programming language of one's choice), the Cost per Factor Found and Factor Found Percentage can be easily be derived from the above fields. |
[QUOTE=chalsall;294858]Everyone now has a new link under their "My Account" menu -- "[URL="http://www.gpu72.com/account/factoring_cost/"]Individual Factoring Cost[/URL]".[/QUOTE]
P1? |
[QUOTE=petrw1;294861]P1?[/QUOTE]
[URL="http://www.gpu72.com/account/factoring_cost/p-1/"]P-1.[/URL] |
[QUOTE=James Heinrich;294859]Would it be easy to add a comparison to the total/average numbers for all users, so I can see how my cost-per-factor compares to overall?[/QUOTE]
Towards this functionality, I've built a weighted linear regression of all our factoring attempts vs. factoring successes across all the ranges we've worked, and all the bit levels we've worked to. To make sure the regression was "sane", I [URL="http://www.gpu72.com/reports/factor_percentage/graph/"]plotted the results[/URL]. This is updated in real-time, and is available as a sub-menu of the "Factor Found Percentages" menu. Please note that at the moment, the percentage found values are being under-reported for several ranges and bit-levels for reasons I won't bore you with. But hopefully this will correct itself soon (hint, hint to the person responsible... :wink:). (And, yes... A P-1 version of this will be available in the near future.) |
I'm currently held from home by work, so my GPU-laptop can't run (not taking it with me on the plane, and not letting it run hot at home without supervision - hot hot hot).
This situation should improve over then next 2 weeks. Chris: that might have an impact on my current assignments... |
[QUOTE=sonjohan;295193]Chris: that might have an impact on my current assignments...[/QUOTE]
No problem. I've marked all of your assignments as "Extended" so they won't be auto-expired. Have a good and safe trip! |
[QUOTE=James Heinrich;290960]I'm sure that's a tidbit of interesting for many people -- could you put that on the user stats page (expected vs found factors; by bit range and overall)?
Of course, if you do that there's bound to be somebody complaining that they're missing some factors because they're below average.[/QUOTE] OK, it's taken some time, but this has now been implemented for Trial Factoring in both meta ranges (DC and LL), based on the weighted linear regression across our entire data set. The P-1 analysis is still to be implemented. This information is currently shown on the Individual Stats page. I'll place it on the "Workers' Overall Progress" page once I have the P-1 implemented (needed for the default aggregate display). Note that the statistics will be constantly adjusted as additional candidates are processed and the WLR gets additional data to fit to. |
Another Wow! This is really cool.
It confirms an impression that my brief period of DCTF yielded a remarkable number of factors. I'm barely under the expected for LLTF. |
[QUOTE=chalsall;295199]OK, it's taken some time, but this has now been implemented for Trial Factoring in both meta ranges (DC and LL), based on the weighted linear regression across our entire data set. The P-1 analysis is still to be implemented.
This information is currently shown on the Individual Stats page. I'll place it on the "Workers' Overall Progress" page once I have the P-1 implemented (needed for the default aggregate display). Note that the statistics will be constantly adjusted as additional candidates are processed and the WLR gets additional data to fit to.[/QUOTE][QUOTE=James Heinrich;290960]I'm sure that's a tidbit of interesting for many people -- could you put that on the user stats page (expected vs found factors; by bit range and overall)? Of course, if you do that there's bound to be somebody complaining that they're missing some factors because they're below average.[/QUOTE] You really weren't kidding. [code] Expected Found DC TF 4.294 1 LL TF 21.919 14[/code] The DC TF is probably low sample size, but 14/22? That's crazy. And chalsall, based on what you're saying, this is after taking into account that 72 bits takes more work and has lower chance? |
[QUOTE=Dubslow;295214]The DC TF is probably low sample size, but 14/22? That's crazy. And chalsall, based on what you're saying, this is after taking into account that 72 bits takes more work and has lower chance?[/QUOTE]
Yes. Sorry to say, but you've really been unlucky. To be clear, the Expected value is calculated as a sum of each bit level attempted mapped to a probability derived from our own empricial stats driven through a weighted linear regression for each range and bit level. The prediction should be quite accurate. |
[QUOTE=chalsall;295215]
The prediction should be quite accurate.[/QUOTE] Famous last words :smile: (Edit: It occurs to me that since I have a slower GPU, I had just been grabbing all the loose expos below 50M. Does the WLR account for range of work as well?) |
[QUOTE=Dubslow;295217]Famous last words :smile:[/QUOTE]
Trust me -- I did sanity checks before I made the data public. It's accurate. For example, across all workers, the prediction is for 1,125.371 DCTF successes, and 1,613.667 LLTF. Actual: 1,145 DCTF; 1,618 LLTF. [QUOTE=Dubslow;295217](Edit: It occurs to me that since I have a slower GPU, I had just been grabbing all the loose expos below 50M. Does the WLR account for range of work as well?)[/QUOTE] Yup. Let me give you the SQL for the LL range: [CODE]select User, sum(if(FactFrom<'68' and FactTo>='68',0.0043212509769495 + Exponent * 1.7068847160983e-10,0)) + sum(if(FactFrom<'69' and FactTo>='69',0.00917633594975476 + Exponent * 3.73910451142651e-11,0)) + sum(if(FactFrom<'70' and FactTo>='70',0.0127505265073034 + Exponent * -4.80405319857885e-11,0)) + sum(if(FactFrom<'71' and FactTo>='71',-0.00785194516204015 + Exponent * 3.71547581675963e-10,0)) + sum(if(FactFrom<'72' and FactTo>='72',-0.00502015659011862 + Exponent * 2.89752696241295e-10,0)) + sum(if(FactFrom<'73' and FactTo>='73',0.0134851694734411 + Exponent * -4.32266525103148e-11,0)) + sum(if(FactFrom<'74' and FactTo>='74',0 + Exponent * 0,0)) as Expected from Assigned where Status=1 and WorkType=200 group by User[/CODE] Note that the numbers in the middle of the "if(,,)" statements are the intercept and slope from the WLR. WorkType of 200 is LLTF. The 0's for the 74 bit level is because we haven't found any factors there. And, as an aside, with a quarter of a million records in the Assigned table, this query takes 0.65 seconds. Gotta love MySQL!!! :smile: |
[QUOTE=chalsall;295220]Trust me -- I did sanity checks before I made the data public. It's accurate.
For example, across all workers, the prediction is for 1,125.371 DCTF successes, and 1,613.667 LLTF. Actual: 1,145 DCTF; 1,618 LLTF. [/QUOTE] 'Twas a joke, good sir ;) |
[CODE]
Expected Found DC 1.408 2 LL 205.506 207[/CODE] I am very average :) |
[QUOTE=kladner;295203]Another Wow! This is really cool.
It confirms an impression that my brief period of DCTF yielded a remarkable number of factors. I'm barely under the expected for LLTF.[/QUOTE] They don't line up in this view, but the categories are: Work Type Rank Of Candidates Factors GHz Days Assigned Completed Expected Found [CODE]DC TF 15 44 710 7.634 16 1,617.285 LL TF 12 59 70 1,385 35.373 34 19,678.934[/CODE] |
When you do implement P-1, can you also show individual average bit size vs. overall average bit size? While what I've lacked in TF factors I make up for in P-1 factors, all of them have been under 91 bits, and only 1 factor >90 bits.
|
[QUOTE=Dubslow;295214]You really weren't kidding.
[code] Expected Found DC TF 4.294 1 LL TF 21.919 14[/code] The DC TF is probably low sample size, but 14/22? That's crazy. And chalsall, based on what you're saying, this is after taking into account that 72 bits takes more work and has lower chance?[/QUOTE] I got some of your factors :smile: [code] Expected Found DC TF 3.904 4 LL TF 61.722 67[/code] Another nice feature this is ... |
[QUOTE=Dubslow;295239]When you do implement P-1, can you also show individual average bit size vs. overall average bit size? While what I've lacked in TF factors I make up for in P-1 factors, all of them have been under 91 bits, and only 1 factor >90 bits.[/QUOTE]
Don't know if this is useful to you, but some time ago James posted a table to show the number of factors found at each bit level. I summarized this by observing that the P-1 factors went as 2^(-bits/8). The aim of this research was to judge how much an extra bit of TF overlapped a P-1. (I wouldn't be surprized if I have not explained this as clearly as I might have done:smile:) David |
As a matter of fact, chalsall has a table of the 100 largest factors for GPU272, which means the data is limited to our specific exponent range. I rank very low on the list, despite a sizeable number of total factors.
|
No comprenti moi.
[QUOTE=Dubslow;295306]As a matter of fact, chalsall has a table of the 100 largest factors for GPU272, which means the data is limited to our specific exponent range. I rank very low on the list, despite a sizeable number of total factors.[/QUOTE]
[url=http://www.youtube.com/watch?v=h8NJ8T0T4uI&feature=endscreen&NR=1]Our hardware size is irrelevant[/url]. My Core2 runs quietly and coolly at 2 GHz 24/7, and surprizing as it may seem to our mutual friend Bob, manages 2 GHzdays/day. (Video excluded) As such, I could claim to speak for the majority of GIMPS participnts. "Our exponent range" is 2^(~25). I share iin the enthusiasm Chris has brought to George's great project, but note that you GPUto72 guys are a tiny and atypical fraction of all participants. I would describe myself as an even less representative example:smile: David |
A few days ago:
[CODE] expected found DCTF 230.34 259 LLTF 271.019 269 [/CODE] As I type this: [CODE] expected found DCTF 230.34 259 LLTF 273.885 275 [/CODE] I must be skewing the DCTF figures (+12.4% ish). Unless someone tells me that deviation is to be expected. Or there's a bunch of people having some hardware issues doing DCTF. The few percentage points deviation on LLTF I guess is to be expected. BTW I've changed my allocation policy to pick older exponents. I thought clearing out the old ones might be the way to go. -- Craig |
[QUOTE=nucleon;295447]I must be skewing the DCTF figures (+12.4% ish). Unless someone tells me that deviation is to be expected. Or there's a bunch of people having some hardware issues doing DCTF.[/QUOTE]
I think it's a couple of things... First of all, you're definitely "luckier" than some others in the DC range. However, another influence is possibly because I'm currently using the same WLR for both ranges. This is probably not the ideal "fit" (although across our full sample set it works out to be better than 99% accurate) because all the DCs have had P-1 done, while less than half of LLs have had P-1 previously attempted. This weekend I'll look into using seperate WTF slopes and intercepts for the two range classes, and see if I can make the predictions even more accurate. [QUOTE=nucleon;295447]BTW I've changed my allocation policy to pick older exponents. I thought clearing out the old ones might be the way to go.[/QUOTE] Excellent. Thank you. And as some might have noticed, we're now effectively out of all work except 71 -> 72. |
[QUOTE=chalsall;295455]I think it's a couple of things...
First of all, you're definitely "luckier" than some others in the DC range. Excellent. Thank you. And as some might have noticed, we're now effectively out of all work except 71 -> 72.[/QUOTE] I count myself as "luckier" than some others, too, in the DC range. I have been picking up some 71-72 assignments, but I have stopped taking assignments for now. I am going to be away starting 04/08/2012 for a few days. I don't really like to leave the box running unattended, so it will be shut down. I have been letting my assignments wind down. I did extend my newer LL/DC assignments, just in case. |
[QUOTE=chalsall;295455]
This weekend I'll look into using seperate WTF slopes and intercepts for the two range classes[/QUOTE] ...what the flux? |
[QUOTE=chalsall;295455]This weekend I'll look into using seperate WTF slopes and intercepts for the two range classes[/QUOTE]
Wednesday, Thursday, Friday |
[QUOTE=petrw1;295458]Wednesday, Thursday, Friday[/QUOTE]
So Happy It's Thursday!:poop: |
[QUOTE=petrw1;295458]Wednesday, Thursday, Friday[/QUOTE]
LOL... I figured everyone would know what WTF mean: [COLOR="White"]Wrote Too Fast...[/COLOR] I meant, of course, WLR. |
Couple of things: First, the scale labeling for the P-1 graph [URL="http://gpu72.com/reports/overall/graph/"]here[/URL] is messed up. It goes null,0,1,1,2, where " " is the y value of the x-axis.
Secondly, why are there only ~370 P-1 candidates? There isn't an unusual amount reserved; is it just (an odd) coincidence? |
[QUOTE=Dubslow;295510]Couple of things: First, the scale labeling for the P-1 graph [URL="http://gpu72.com/reports/overall/graph/"]here[/URL] is messed up. It goes null,0,1,1,2, where " " is the y value of the x-axis.[/QUOTE]
Whoops. Thanks. I made some adjustments to the graphs so they would line up better. Unfortunately when I looked at the weekly graphs P-1 had a large block of results which made the labels sane. Not so today. Fixed. [QUOTE=Dubslow;295510]Secondly, why are there only ~370 P-1 candidates? There isn't an unusual amount reserved; is it just (an odd) coincidence?[/QUOTE] No. We've got a large batch of TF results without P-1 expected to complete shortly, so I threw 600 back to PrimeNet to hand out as it feels is appropriate. |
Chalsall,
Can you check your get LL TF form? I selected oldest exponent and it allocated based on lowest TF level. I can't rule out user error. I went back button in browser and it showed oldest. -- Craig |
| All times are UTC. The time now is 01:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.