![]() |
[QUOTE=chalsall;503715]Sweet! You are one of the most qualified to do this analysis. Sincerely.
Will you be taking into consideration the available compute, and its SP/DP ratios? :smile:[/QUOTE] My main objective is to simply settle whether SP-based FFT-mul can be used at all for moduli of interest to GIMPS: o If not, I will need to revisit my random-walk ROE heuristic to see where it falls short; o If yes, and my heuristics re. FFT-length and modulus size are even close to what I observe in actual practice, an SP-based GPU LL test will be of immediate interest. But for now, I suggest being pessimistic and assuming that Preda - the only person I know who has actually tried SP for such work - correctly concluded nonfeasibility for such an approach. At least that will be my attitude - expect the worst, but hope for a pleasant surprise. |
I have done only the very preliminary check of using SP library FFT rather than DP library FFT in CUDALucas -- this failed to produce round-off errors less than 0.5 for FFT lengths up to 30000K on exponents in the low 80M range. I am inexperienced in GPU programming and not familiar with numerical methods for FT so I gave up there; I don't know if what I did failed for reasons other than not having enough precision.
|
[QUOTE=penlu;503722]I don't know if what I did failed for reasons other than not having enough precision.[/QUOTE]
That's cool. All (or, at least, almost all) attempts are valuable. Quite possibly this simple question from a new participant might result in an optimization. It's happened before.... :smile: |
[QUOTE=chalsall;503692]Short answer: Yes, it is possible.
Longer answer: It doesn't make sense. Many more OPs and/or memory needed. We have some /very/ good GPU programmers here, writing code optimally for this very rarefied problem space. If it was possible to improve the throughput using SP, they would have done it.[/QUOTE] Thank you for replying me. Although using single precision will need more OPs and/or memory, the GPU I use actually have 6GB memory and 32:1 SP:DP speed, seems enough for a single-precision LL-test. That's why I post this thread. |
[QUOTE=kriesel;503707]Welcome.
Your English seems nearly perfect. Possibly you'll find the attachment in the second post of [url]https://www.mersenneforum.org/showthread.php?t=23371[/url] useful. Your GTX1060 can run LL (CUDALucas), P-1 (CUDAPm1), or trial factoring (mfaktc). Its contribution would probably be maximized by running trial factoring. Unfortunately there is no CUDA PRP code for mersenne hunting currently.[/QUOTE] Thank you for replying me. Actually I have already using mfaktc in the morning (since the fan is too noisy during the night) and get some good result: Manual testing[URL="https://www.mersenne.org/report_exponent/?exp_lo=190817147&full=1"]190817147[/URL]F2018-12-22 10:020.3Factor: 3798244479194871411047 / TF: 71-72 The question is, I cannot rely on the result it make. mfaktc only report two things: whether a number have a factor or not, if it report "yes", it is easy to check, but if it report "no", it just report "no", no more information will bring to me. It is very hard to check whether the "NF" result is reliable, and I just want to generate a reliable (at least checkable) result. |
[QUOTE=Neutron3529;503734]Although using single precision will need more OPs and/or memory, the GPU I use actually have 6GB memory and 32:1 SP:DP speed, seems enough for a single-precision LL-test.[/QUOTE]
Sigh... Fortunately some really smart people have taken on your challenge, and we might find out soon if this is even possible, let alone worthwhile. But, FYI, "seems enough" is rarely enough around these here parts.... |
[QUOTE=ewmayer;503695]Flipping things around and asking what SP FFT length is needed to handle current-wavefront exponents gives 24576K = 24M, slightly above 5x the DP FFT length. So being pessimistic, on hardware where there is, say, a 10x or more per-cycle difference between SP and DP throughput, SP could well be a win.[/QUOTE]
5x FFT size means 2.5x memory usage. There is serious indications on non-DP-crippled GPUs that LL tests are severely memory-bottlenecked. Increasing the memory usage by 2.5x would just exacerbate the situation, even if theoretically the GPU could otherwise finish the computation sequence faster. On the flip side, this means that smaller FFTS (say < 1M) might benefit more from this, which might be useful for things like LLR where a lot of projects are there (Top 5000 entry point is around 1.4mbits). |
[QUOTE=Neutron3529;503735]Thank you for replying me.
Actually I have already using mfaktc in the morning (since the fan is too noisy during the night) and get some good result: Manual testing[URL="https://www.mersenne.org/report_exponent/?exp_lo=190817147&full=1"]190817147[/URL]F2018-12-22 10:020.3Factor: 3798244479194871411047 / TF: 71-72 The question is, I cannot rely on the result it make. mfaktc only report two things: whether a number have a factor or not, if it report "yes", it is easy to check, but if it report "no", it just report "no", no more information will bring to me. It is very hard to check whether the "NF" result is reliable, and I just want to generate a reliable (at least checkable) result.[/QUOTE] Run the self tests if you are concerned. You're right in that it's possible bad/overclocked hardware may have errors. The only way to tell is to look at how many factors you find over time, which should be a little more than 1 in 100 attempts. |
[QUOTE=Mark Rose;503746]Run the self tests if you are concerned.
You're right in that it's possible bad/overclocked hardware may have errors. The only way to tell is to look at how many factors you find over time, which should be a little more than 1 in 100 attempts.[/QUOTE] That may not true I found 5 factors in 318 attemps, the secret to find a factor is to try the lower possible factor limit, for those "no factor from 2^64 to 2^75", it seems less likely yo have a factor in range 2^75 to 2^76 which implies, I may run the mfaktc for the whole day and no factor be found, which may be so disappointing that I really do not want to see |
[QUOTE=chalsall;503739]Sigh...
Fortunately some really smart people have taken on your challenge, and we might find out soon if this is even possible, let alone worthwhile. But, FYI, "seems enough" is rarely enough around these here parts....[/QUOTE] God bless the 32:1 SP:DP ratio works... BTW, my poor English does not tell me how to represent the previous sentence correctly for a non-Christian (like me). Firstly I choose "may", use "[I]may[/I] the force be with [I]you[/I]" to create a sentence like "May the 32:1 SP:DP ratio works." But I really doubt that if it will be confused with such sentence: "May the 32:1 SP:DP ratio works?" or "Will the 32:1 SP:DP ratio works?" Such sentences is more likely to become a question, not a "may" in "[I]may[/I] the force be with [I]you[/I]" I want to know that, will English native speaker be confused with the sentences "May the 32:1 SP:DP ratio works" ? And what's more, how to represent the previous sentence correctly for a non-Christian (like me). |
[QUOTE=Neutron3529;503757]
I want to know that, will English native speaker be confused with the sentences "May the 32:1 SP:DP ratio works" ? And what's more, how to represent the previous sentence correctly for a non-Christian (like me).[/QUOTE] I would say "May the 32:1 SP:DP ratio work!" or "I hope the 32:1 SP:DP ratio works!". |
| All times are UTC. The time now is 23:24. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.