![]() |
[QUOTE=LaurV;457487]It doesn't matter what machines you use, and how fast is one compared with the other. If GPUs become 1000 times faster than they currently are, we will raise the factoring bitlevel with a couple of bits (ten, more exactly, as the amount of work doubles with each bitlevel), but it will still be that the last bitlevel takes a double amount of time than the former-last, for about the same chance to find a factor (1/n vs. 1/(n+1), or so, see GIMPS math page). For P-1 the chances get higher faster, with the amount of work you do (limits, FFT size). Therfore the P-1 and TF "chances curves" will still intersect somewhere, and unless that is exactly in an integer point, it will still be advantageous to do P-1 before last bit of TF for some ranges. Especially thinking that we also use GPUs to do P-1 too (see cudaPM1).[/QUOTE]
Such a graph would make a useful addition to James' site. |
progression of an exponent
[QUOTE=science_man_88;457201][URL]https://www.mersenne.org/various/math.php[/URL] isn't clear enough ?[/QUOTE]
For prime95, the program supports the multiple approaches and does the right thing for efficiency given the tradeoffs as they are understood. That can be maximized by setting it to "whatever makes the most sense" through Test, Worker Windows, Type of work to get. As I understand it, and George (prime95) et al can chime in if I have it wrong (& please do), the picture changes somewhat when there's a mix of gpus and cpus applied. The link you gave relates to trial factoring vs LL test cost/benefit for cpus. PrimeNet spreads the work around in time and across computing devices, giving out trial factoring assignments a bit depth at a time, P-1, first-time lucas lehmer tests, and double checks (and rechecks if there's a mismatch found). This lets GPUs be applied to greater trial factoring depths, at which they are very productive, saving cpu time for LL tests. (Rated Ghz-days/day of gpus are MUCH higher in TF than other uses.) Consequently, the tradeoff is deeper trial factoring is worthwhile on gpus than for cpus. For a given gpu, there are plots available for gpu trial factoring versus lucas lehmer test via cudalucas on the same model gpu, such as for the gtx1070 at [URL]http://www.mersenne.ca/cudalucas.php?model=683[/URL] gtx1070 Currently the various approaches (trial factoring, P-1, lucas lehmer test) are separate programs on gpus. PrimeNet and operator involvement are needed to do the right thing there. CUDAPm1 seems to me to pick pretty good P-1 parameters automatically, although I haven't studied that much. The system isn't perfect. I have observed that almost any P-1 reported result counts. I recently checked many dozen manual assignments I had received for double checking, and found over a dozen had been P-1 factored with B1=B2=lower than expected (stage one only). But these were not available for reserving for P-1, and had not run as deeply in P-1 as would be optimal. I also observed when obtaining detailed reports on exponents I had reserved for double check, that in a case or two, someone was still running and reporting updates on exponents that had been expired for them more than a year before. Being in that same boat with my slow laptop, I tried to reserve as double checks, so no one else would be assigned them as double checks, to avoid an unnecessary triple check, a couple exponents I had received originally as first time tests and were still running. I could not. I was surprised to find the majority of double checks I had been assigned earlier this year in the 43M-44M range had been assigned and expired SEVEN times before. Most were 0.0-1% complete. There were a few exceptions, at 29-40% and even over 65% complete when abandoned. Progress of an exponent can be glacial in this scheme. That does not mean it's wrong. It's not unusual from what I saw, that an LLtest completed in 2008 or 2009 is being double checked in 2017. It's all volunteer effort, and people run whatever they like. Perhaps the ones I was assigned were the exceptions. The progress over time of a given exponent can be obtained such as [URL]https://www.mersenne.org/report_exponent/?exp_lo=43166381&exp_hi=&full=1[/URL] For this one double check was completed within months of the first time ll test. Perhaps that is more typical. Overall, the current state of progress can be seen at the work distribution map at [URL]https://www.mersenne.org/primenet/[/URL] |
P-1 graphs
[QUOTE=Mark Rose;457498]Such a graph would make a useful addition to James' site.[/QUOTE]
yes it would. I inquired of James recently. He doesn't have the data about P-1 versus other (TF, LL) to make such tradeoff graphs. He's done a very nice job on TF vs LL per GPU model though. The GPU tradeoff TF/LL is quite different from cpu tradeoff. Presumably the author of CUDAPm1 built some approximation of a sensible P-1/LL tradeoff into it, since it can pick its own bounds based on the worktodo line for an exponent specifying other things but not specifying bounds. So that might mean P-1/LL tradeoff plots are not needed. It seems to me P-1 effort could use more computing resources; it's not that far ahead of first time Lucas Lehmer testing, as indicated by Current Progress, Work Distribution Map at [URL]https://www.mersenne.org/primenet/;[/URL] thousands of LL tests assigned in 81M, and thousands of P-1 assigned in 82M! |
P-1 tradeoff
[QUOTE=LaurV;457487]It doesn't matter what machines you use, and how fast is one compared with the other. If GPUs become 1000 times faster than they currently are, we will raise the factoring bitlevel with a couple of bits (ten, more exactly, as the amount of work doubles with each bitlevel), but it will still be that the last bitlevel takes a double amount of time than the former-last, for about the same chance to find a factor (1/n vs. 1/(n+1), or so, see GIMPS math page). For P-1 the chances get higher faster, with the amount of work you do (limits, FFT size). Therfore the P-1 and TF "chances curves" will still intersect somewhere, and unless that is exactly in an integer point, it will still be advantageous to do P-1 before last bit of TF for some ranges. Especially thinking that we also use GPUs to do P-1 too (see cudaPM1).[/QUOTE]
Do you have data about what the P-1 vs LL vs trial factoring tradeoff should be, for gpu models not available when CUDAPm1 was written embodying apparently some sort of coded tradeoff, or a sense of what that tradeoff algorithm is in CUDAPm1? |
CUDA run times vs exponent for TF, P-1, LL test
I've posted some data for the same GTX480 and a wide variety of exponent samples at [URL]http://www.mersenneforum.org/showthread.php?t=22450[/URL] recently.
|
| All times are UTC. The time now is 10:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.