PRP for F33?
Hi,
first of all, sorry if this is a noob question. I am wondering why noone has proven the (non)primality of F33 with PRP testing. It will be a huge task, but should be doable, shouldn't it? [LIST][*]F33 is as large as M8.589.934.591, or in the 8G range.[*]Someone has done a PRP Test for [URL="https://www.mersenne.ca/exponent/999999937"]a 1G exponent[/URL][*]mersenne.ca reports [URL="https://www.mersenne.ca/credit.php?f_exponent=30&worktype=LL"]62.000 GHzd[/URL] for F30[*]So a F33 PRP test will be around 8 times this, i.e. around half a million GHz Days.[*]An 8G number should easily fit in the RAM of a powerful computer, too.[/LIST]So why is noone attempting this task? Of course it would require something like an Amazon c5.metal instance, but if I would be Ben Delo and pushing 50 million GHz Days in the last year, then I would use 1% of this in order to prove the compositeness of F33... 
It is ~8.6 times as many iterations as the 1G exponent, but it would require a much much larger FFT size, so each iteration would take much much longer. Not sure how much longer but total running time is >50 times longer than the 1G exponent and probably a lot more.
Another huge problem is that there are currently no programs that can do a large enough FFT size for a this number. I think the largest currently is CUDALucas 65536K (64M) FFT which is "only" enough for ~1.14G exponents. 
[QUOTE=gLauss;558648][*]So a F33 PRP test will be around 8 times this, i.e. around half a million GHz Days.[/LIST][/QUOTE]
This line is where you fail. FFT size rises with the size of the exponent, mostly linearly. So, you have a quadratic increase in testing time: ~65 times as long a task as F30. Once Ernst or Mihai or George extend the software to handle it, someone may well take it on but at 89x the time estimate you gave, you should see that it's not obvious that it needs doing on current hardware. 
[QUOTE=VBCurtis;558665]This line is where you fail. FFT size rises with the size of the exponent, mostly linearly. So, you have a quadratic increase in testing time: ~65 times as long a task as F30.
Once Ernst or Mihai or George extend the software to handle it, someone may well take it on but at 89x the time estimate you gave, you should see that it's not obvious that it needs doing on current hardware.[/QUOTE] Is there a way we could do a backoftheenvelope calculation: assuming GpuOwl supported Fermat numbers, what FFT size would be required, and how long would a run take on V100/A100/RTX3090 etc? 
Sure double the exponent, double the FFT size. Then add a little bit more, say 10% extra per doubling.
forumite ewmeyer, the developer of cudalucas and mlucas, has done the largest Fermat test that I know of. See [url]https://mersenneforum.org/showthread.php?t=21544[/url] for details of his efforts to run F30. I believe he used 32768K FFT size for F30, so something like 262144K or whatever the next step larger would be sufficient to test F33. F30 took a really long time, though I'm not seeing F33 as a reasonable proposition until GPUs are an order of magnitude faster. 
[QUOTE=ATH;558656]I think the largest currently is CUDALucas 65536K (64M) FFT which is "only" enough for ~1.14G exponents.[/QUOTE]
Actually Prime95 also have 64M AVX512 FFT, and it can probably go a bit higher than 1.14G exponents I would guess, have not tested. I have not checked FMA FFT recently in Prime95, but it goes to at least 50M. 
[QUOTE=VBCurtis;558685]forumite ewmeyer, the developer of cudalucas and mlucas, has done the largest Fermat test that I know of.[/QUOTE]
Small correction, forumite msft made cudaLucas, the first GPUenabled LL test ever, using Nvidia cuFFT libraries, albeit at the time was not called cudaLucas. Other people contributed to it and improved it a lot, later, as well as renamed it to the actual name, but Ernst had not much of a hand in it. We however agree about the other two things  Ernst being the "daddy" of mlucas, which brings LL test to other nonx86 architectures, and he being the one testing the largest fermat number "[URL="https://en.wikipedia.org/wiki/P%C3%A9pin%27s_test"]pepinned [/URL]to the ground" today. (Ha! I think I coined a new term! :razz:) 
Thanks for the answers. I now understand why it wasn't done before:
[LIST=1][*]It is actually a few million GhzDays of work.[*]There is no software which can deal with this number yet.[*]While a few million GHzDays of computation are done by large GIMPS contributors, the problem is that all of this is a single big computation and you cannot simply run it on N different computers in parallel. The best iteration times back in 2017 were around 440ms, see [URL="https://mersenneforum.org/showpost.php?p=453841&postcount=287"]this post from the thread linked above.[/URL] So the calculation would actually take around 8G*0.44s or around 120 years which is longer than the lifetime of a human. Even if we factor in that one might switch the computation to the newest generation of CPU/GPU every few years, it still would require at least a few decades of effort and quite a big investment.[/LIST] 
[QUOTE=gLauss;558726]The best iteration times back in 2017 were around 440ms, see [URL="https://mersenneforum.org/showpost.php?p=453841&postcount=287"]this post from the thread linked above.[/URL][/QUOTE]
Ok, so #2 on the list is not valid then, there is a software that can handle it: Mlucas. Since Ernst got an iteration time on F33, Mlucas must have big enough FFT to handle it, and now that I check I can see it goes to at least 240M FFT (245760K) in an old selftest I did on Mlucas. But the 2 other points are still valid, it takes way too long and it cannot be distributed to many computers. 
Before even thinking on performing the primality check of F33, a lot more trial factoring is needed and then someone has to run P1.

recently , I factored (PM1)a 535 M exponent with PM1, the FFT lenght was "fftlength":31457280 aka 30720k.
[code] [Tue Sep 1 02:03:45 2020] FFTlen=30720K, Type=3, Arch=4, Pass1=1536, Pass2=20480, clm=4 (3 cores, 1 worker): 67.53 ms. Throughput: 14.81 iter/sec. FFTlen=30720K, Type=3, Arch=4, Pass1=1536, Pass2=20480, clm=2 (3 cores, 1 worker): 66.36 ms. Throughput: 15.07 iter/sec. FFTlen=30720K, Type=3, Arch=4, Pass1=1536, Pass2=20480, clm=1 (3 cores, 1 worker): 75.09 ms. Throughput: 13.32 iter/sec. FFTlen=30720K, Type=3, Arch=4, Pass1=2048, Pass2=15360, clm=4 (3 cores, 1 worker): 64.08 ms. Throughput: 15.61 iter/sec. FFTlen=30720K, Type=3, Arch=4, Pass1=2048, Pass2=15360, clm=2 (3 cores, 1 worker): 64.12 ms. Throughput: 15.60 iter/sec. FFTlen=30720K, Type=3, Arch=4, Pass1=2048, Pass2=15360, clm=1 (3 cores, 1 worker): 66.83 ms. Throughput: 14.96 iter/sec. [/code] so, if the exponent is 16 time higher, I guess... 1 sec by iteration? (that was on Prime95) 
All times are UTC. The time now is 07:15. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.