![]() |
Yes, a small part of mfaktc is still a CPU task.
It just came to my mind reading your post, if I am not mistaken, the GCD step of cudaPm1 is still a CPU task too, so when you think about computing times, running P95 slows you down there too (but the GCD step is minuscule of the time, depending on E also) |
[QUOTE=frmky;412521]How far ahead of P-1 are we if we change strategy so that rather than attempt a full TF before P-1, we plan to TF to one bit level lower, currently 74, before P-1, then complete the final bit level after P-1? With the excess P-1 power right now, it should be much easier to stay ahead of LL.[/QUOTE]
To speak to this a bit further... My thinking is that it is better to go to 75 before P-1 (where possible) because it lets the P-1 run search with higher bounds. Further, to speak to LaurV's argument, on James' graph yes the 75 bit cross over point is indeed 66M, but keep in mind that even down at 60M it's still 74.6545 bits. In my mind this means it would still be "profitable" going to 75 down there (or ideally 74.6545 bits if mfaktX supported it). Happy to be proven that my thinking is wrong. |
[QUOTE=chalsall;412572]To speak to this a bit further... My thinking is that it is better to go to 75 before P-1 (where possible) because it lets the P-1 run search with higher bounds.[/QUOTE]
Actually with a higher TF level, P95 runs with lower bounds since there is a smaller chance of finding a factor. To get actual numbers, I used 73412063 which is currently TF'd to 76. Given TF to 76, P95 on my computer runs with B1=555k, B2=7.77M. For lower TF levels, we find ... [CODE]TF B1 B2 74 635K 9.68M 75 605K 8.77M 76 555K 7.77M[/CODE] |
[QUOTE=frmky;412620]Actually with a higher TF level, P95 runs with lower bounds since there is a smaller chance of finding a factor.[/QUOTE]It's the complex relationship between bounds, factor probability and runtime. Higher bounds mean higher chance of factor, but longer runtime. Runtime goes up a lot quicker with higher bounds than factor probability does, so there's a break-even point somewhere. Prime95 tries to pick bounds to maximize factors-per-time (through an iterative trial-error process if I understand correctly).
Expanding on the above table a bit: [CODE]TF B1 B2 FactorProb 74 635K 9.68M 3.406898% 75 605K 8.77M 2.993129% 76 555K 7.77M 2.591607%[/CODE] |
GPU72 not noticing completed 65M TF work
GPU72 assigned me exponents in the 65M range to take to 75 bits, but it doesn't seem to be noticing the completion of that work.
........ It has since picked them up. |
[QUOTE=Chuck;412624]It has since picked them up.[/QUOTE]
Yeah; thanks for the "ping". Spidy wasn't watching that range because we haven't had any candidates down there until recently; that range has slipped down into Cat 1, so very old assignments are being recycled. |
[QUOTE=James Heinrich;412621]Expanding on the above table a bit:[/QUOTE]
OK, help me out here guys. As you know, I do code, not math. What is optimal for GIM[B][U]P[/U][/B]S? As in, what will find the most factors per time unit based on our available resources? Should we indeed only go to 74 (or even 73 and below) before the P-1 run, and then 75 if a factor isn't found? My (perhaps mis-) understanding was going as high as we could with TF'ing first was better, but if it's not then we should change our strategy. GPU72 was created and is managed to help find the next Mersenne Prime, not just to find factors. |
[QUOTE=chalsall;412645]OK, help me out here guys. As you know, I do code, not math.[/QUOTE]Same here... :cool:
But I can generate some numbers to work with. Continuing the above example, I set Prime95 to use 4GB and ran the same exponent through at different TF levels (just long enough to see the bounds and ETA):[code] Exponent TF B1 B2 Prob Runtime 73412063 65 785000 30026250 9.87 27h07m 73412063 66 805000 28980000 9.06 26h41m 73412063 67 805000 27168750 8.24 25h33m 73412063 68 805000 25357500 7.46 24h25m 73412063 69 800000 23600000 6.73 23h18m 73412063 70 795000 21663750 6.02 22h03m 73412063 71 765000 19698750 5.35 20h30m 73412063 72 730000 16972500 4.69 18h26m 73412063 73 690000 15007500 4.12 16h45m 73412063 74 655000 13263750 3.62 15h18m 73412063 75 610000 11895000 3.17 13h58m 73412063 76 565000 10452500 2.75 12h35m 73412063 77 535000 9496250 2.40 11h39m 73412063 78 495000 8415000 2.07 10h35m 73412063 79 455000 7166250 1.76 9h22m 73412063 80 410000 6047500 1.47 8h11m [/code] |
[QUOTE=James Heinrich;412653]But I can generate some numbers to work with.[/QUOTE]
Interesting... And, adding another column, "Probability per Hour" we get:[CODE]Level Prob/Hour 65 0.3640 66 0.3395 67 0.3225 68 0.3055 69 0.2888 70 0.2730 71 0.2610 72 0.2544 73 0.2460 74 0.2366 75 0.2270 76 0.2185 77 0.2060 78 0.1956 79 0.1879 80 0.1796[/CODE] OK, I'm beginning to be convinced that releasing for P-1'ing at lower levels might make sense for two reasons. First, it would slow down the P-1'ing since it takes longer. And secondly, it would mean that the GPU TF'ers would have (slightly) less work to do. I'm wondering though... This test was with 4GB allocated. Is the same trend evident with less? Seperately, is the same trend present for all candidate ranges? What about when Stage 2 isn't done? Perhaps Aaron could speak to what percentage of P-1 have both stages done? Thoughts? |
[QUOTE=James Heinrich;412653]Same here... :cool:
But I can generate some numbers to work with. Continuing the above example, I set Prime95 to use 4GB and ran the same exponent through at different TF levels (just long enough to see the bounds and ETA):[code] Exponent TF B1 B2 Prob Runtime 970TF 73412063 65 785000 30026250 9.87 27h07m 73412063 66 805000 28980000 9.06 26h41m 73412063 67 805000 27168750 8.24 25h33m 73412063 68 805000 25357500 7.46 24h25m 73412063 69 800000 23600000 6.73 23h18m 73412063 70 795000 21663750 6.02 22h03m 73412063 71 765000 19698750 5.35 20h30m 73412063 72 730000 16972500 4.69 18h26m 73412063 73 690000 15007500 4.12 16h45m 0h38m 73412063 74 655000 13263750 3.62 15h18m 1h15m 73412063 75 610000 11895000 3.17 13h58m 2h30m 73412063 76 565000 10452500 2.75 12h35m 73412063 77 535000 9496250 2.40 11h39m 73412063 78 495000 8415000 2.07 10h35m 73412063 79 455000 7166250 1.76 9h22m 73412063 80 410000 6047500 1.47 8h11m [/code][/QUOTE] I partially filled in another column...on can simply extrapolate the bit levels above and below. So my GTX-970 Extreme can complete the TF 73-74 above in about 1.25 hours and save James P-1 1.45 hours. The next bit TF 74-75 would take my card 2.5 hours and save P-1 1.33 hours. However the odds that the TF finds a factor are about 1/3 that of P-1. Mind you if the TF does find a factor it saves the entire P-1 time ... Ok maybe I need a REAL math/stats person too. |
I think if we have the TF capacity, we should still fully TF before P-1. It's work that needs to be done anyway, and if it saves P-1 time, it increases the overall system throughput.
If I understand the math correctly, by eliminating the smaller factors, any remaining factor would have to be [url=https://en.wikipedia.org/wiki/Smooth_number#Powersmooth_numbers]more smooth[/url], so the range of potential factors is smaller. P-1 is like TF in that it's eventually cheaper to run the full LL test. So by TF'ing higher, we save P-1 work at no detriment, if I understand correctly. So the real problem is that we have too many resources doing P-1 work and not LL/DC or TF. |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.