mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   GPU to 72 status... (https://www.mersenneforum.org/showthread.php?t=16263)

LaurV 2015-10-13 07:01

Yes, a small part of mfaktc is still a CPU task.
It just came to my mind reading your post, if I am not mistaken, the GCD step of cudaPm1 is still a CPU task too, so when you think about computing times, running P95 slows you down there too (but the GCD step is minuscule of the time, depending on E also)

chalsall 2015-10-13 14:08

[QUOTE=frmky;412521]How far ahead of P-1 are we if we change strategy so that rather than attempt a full TF before P-1, we plan to TF to one bit level lower, currently 74, before P-1, then complete the final bit level after P-1? With the excess P-1 power right now, it should be much easier to stay ahead of LL.[/QUOTE]

To speak to this a bit further... My thinking is that it is better to go to 75 before P-1 (where possible) because it lets the P-1 run search with higher bounds.

Further, to speak to LaurV's argument, on James' graph yes the 75 bit cross over point is indeed 66M, but keep in mind that even down at 60M it's still 74.6545 bits. In my mind this means it would still be "profitable" going to 75 down there (or ideally 74.6545 bits if mfaktX supported it).

Happy to be proven that my thinking is wrong.

frmky 2015-10-13 22:49

[QUOTE=chalsall;412572]To speak to this a bit further... My thinking is that it is better to go to 75 before P-1 (where possible) because it lets the P-1 run search with higher bounds.[/QUOTE]
Actually with a higher TF level, P95 runs with lower bounds since there is a smaller chance of finding a factor. To get actual numbers, I used 73412063 which is currently TF'd to 76. Given TF to 76, P95 on my computer runs with B1=555k, B2=7.77M. For lower TF levels, we find ...
[CODE]TF B1 B2
74 635K 9.68M
75 605K 8.77M
76 555K 7.77M[/CODE]

James Heinrich 2015-10-13 23:48

[QUOTE=frmky;412620]Actually with a higher TF level, P95 runs with lower bounds since there is a smaller chance of finding a factor.[/QUOTE]It's the complex relationship between bounds, factor probability and runtime. Higher bounds mean higher chance of factor, but longer runtime. Runtime goes up a lot quicker with higher bounds than factor probability does, so there's a break-even point somewhere. Prime95 tries to pick bounds to maximize factors-per-time (through an iterative trial-error process if I understand correctly).

Expanding on the above table a bit:
[CODE]TF B1 B2 FactorProb
74 635K 9.68M 3.406898%
75 605K 8.77M 2.993129%
76 555K 7.77M 2.591607%[/CODE]

Chuck 2015-10-14 00:32

GPU72 not noticing completed 65M TF work
 
GPU72 assigned me exponents in the 65M range to take to 75 bits, but it doesn't seem to be noticing the completion of that work.

........

It has since picked them up.

chalsall 2015-10-14 13:44

[QUOTE=Chuck;412624]It has since picked them up.[/QUOTE]

Yeah; thanks for the "ping". Spidy wasn't watching that range because we haven't had any candidates down there until recently; that range has slipped down into Cat 1, so very old assignments are being recycled.

chalsall 2015-10-14 13:51

[QUOTE=James Heinrich;412621]Expanding on the above table a bit:[/QUOTE]

OK, help me out here guys. As you know, I do code, not math.

What is optimal for GIM[B][U]P[/U][/B]S? As in, what will find the most factors per time unit based on our available resources?

Should we indeed only go to 74 (or even 73 and below) before the P-1 run, and then 75 if a factor isn't found?

My (perhaps mis-) understanding was going as high as we could with TF'ing first was better, but if it's not then we should change our strategy. GPU72 was created and is managed to help find the next Mersenne Prime, not just to find factors.

James Heinrich 2015-10-14 14:26

[QUOTE=chalsall;412645]OK, help me out here guys. As you know, I do code, not math.[/QUOTE]Same here... :cool:

But I can generate some numbers to work with. Continuing the above example, I set Prime95 to use 4GB and ran the same exponent through at different TF levels (just long enough to see the bounds and ETA):[code]
Exponent TF B1 B2 Prob Runtime
73412063 65 785000 30026250 9.87 27h07m
73412063 66 805000 28980000 9.06 26h41m
73412063 67 805000 27168750 8.24 25h33m
73412063 68 805000 25357500 7.46 24h25m
73412063 69 800000 23600000 6.73 23h18m
73412063 70 795000 21663750 6.02 22h03m
73412063 71 765000 19698750 5.35 20h30m
73412063 72 730000 16972500 4.69 18h26m
73412063 73 690000 15007500 4.12 16h45m
73412063 74 655000 13263750 3.62 15h18m
73412063 75 610000 11895000 3.17 13h58m
73412063 76 565000 10452500 2.75 12h35m
73412063 77 535000 9496250 2.40 11h39m
73412063 78 495000 8415000 2.07 10h35m
73412063 79 455000 7166250 1.76 9h22m
73412063 80 410000 6047500 1.47 8h11m
[/code]

chalsall 2015-10-14 15:44

[QUOTE=James Heinrich;412653]But I can generate some numbers to work with.[/QUOTE]

Interesting... And, adding another column, "Probability per Hour" we get:[CODE]Level Prob/Hour
65 0.3640
66 0.3395
67 0.3225
68 0.3055
69 0.2888
70 0.2730
71 0.2610
72 0.2544
73 0.2460
74 0.2366
75 0.2270
76 0.2185
77 0.2060
78 0.1956
79 0.1879
80 0.1796[/CODE]

OK, I'm beginning to be convinced that releasing for P-1'ing at lower levels might make sense for two reasons. First, it would slow down the P-1'ing since it takes longer. And secondly, it would mean that the GPU TF'ers would have (slightly) less work to do.

I'm wondering though... This test was with 4GB allocated. Is the same trend evident with less? Seperately, is the same trend present for all candidate ranges? What about when Stage 2 isn't done? Perhaps Aaron could speak to what percentage of P-1 have both stages done?

Thoughts?

petrw1 2015-10-14 15:50

[QUOTE=James Heinrich;412653]Same here... :cool:

But I can generate some numbers to work with. Continuing the above example, I set Prime95 to use 4GB and ran the same exponent through at different TF levels (just long enough to see the bounds and ETA):[code]
Exponent TF B1 B2 Prob Runtime 970TF
73412063 65 785000 30026250 9.87 27h07m
73412063 66 805000 28980000 9.06 26h41m
73412063 67 805000 27168750 8.24 25h33m
73412063 68 805000 25357500 7.46 24h25m
73412063 69 800000 23600000 6.73 23h18m
73412063 70 795000 21663750 6.02 22h03m
73412063 71 765000 19698750 5.35 20h30m
73412063 72 730000 16972500 4.69 18h26m
73412063 73 690000 15007500 4.12 16h45m 0h38m
73412063 74 655000 13263750 3.62 15h18m 1h15m
73412063 75 610000 11895000 3.17 13h58m 2h30m
73412063 76 565000 10452500 2.75 12h35m
73412063 77 535000 9496250 2.40 11h39m
73412063 78 495000 8415000 2.07 10h35m
73412063 79 455000 7166250 1.76 9h22m
73412063 80 410000 6047500 1.47 8h11m
[/code][/QUOTE]

I partially filled in another column...on can simply extrapolate the bit levels above and below.

So my GTX-970 Extreme can complete the TF 73-74 above in about 1.25 hours and save James P-1 1.45 hours.
The next bit TF 74-75 would take my card 2.5 hours and save P-1 1.33 hours.

However the odds that the TF finds a factor are about 1/3 that of P-1.

Mind you if the TF does find a factor it saves the entire P-1 time ...

Ok maybe I need a REAL math/stats person too.

Mark Rose 2015-10-14 16:12

I think if we have the TF capacity, we should still fully TF before P-1. It's work that needs to be done anyway, and if it saves P-1 time, it increases the overall system throughput.

If I understand the math correctly, by eliminating the smaller factors, any remaining factor would have to be [url=https://en.wikipedia.org/wiki/Smooth_number#Powersmooth_numbers]more smooth[/url], so the range of potential factors is smaller.

P-1 is like TF in that it's eventually cheaper to run the full LL test.

So by TF'ing higher, we save P-1 work at no detriment, if I understand correctly.

So the real problem is that we have too many resources doing P-1 work and not LL/DC or TF.


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.