![]() |
LL on Mersenne, Fermat numbers and Pepin's test
Hi Owftheevil,Manpowre
My recomended mod to the threads in invocation of the kernel does not give the right results. It for example says M110503 is composite instead of prime. So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey. However my mod for Fermat numbers can be done in version 2.03 and 2.05 alpha if n is calculated in code as 2^2^n+1 instead of 2^p-1. Does any one know which lines of code that n=2^p-1 is calculated or is it inherent in the cufft calls? Thank you! PS For example 2.03 version runs M61787581 in ETA 60 hrs. will check for a few days if that is prime? Also to be added to parse.c is trial division by small primes of Mp. This can be done simply in parse.c by implementing it in GMP. Also is there any GPU code that does a Pepins' test on Fermat numbers? |
Well, I certainly understand R.D. Silverman better now.
|
[QUOTE=prime7989;341838]Hi Owftheevil,Manpowre
... So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey...[/QUOTE] Note that Owftheevil does understand the code and has made significant modifications to msft's original code. :smile: |
[QUOTE=kracker;341832]Never mind I can't "officially" buy a 570 or 580 I think...
Just for my curiosity, how many Ghzdays(Yes, TF..wrong thread?) do you/can you get on at 680 per $?[/QUOTE] I don't own 6xx cards, but from former testing and from James' page, their performance is lousy, compared with 5xx. I think you are hitting to GHzD per Watt, and not per dollar. Well, at the end it comes to dollars too, but this is when you plan to use them for long time, or live in an area where the electricity is expensive. The best part of Keplers (6xx) is the power management (Keplers can reduce the power and consume very little, compared to Fermis, like they have half the DP performance, or less, but consume a third of power, or a quarter - it highly depends of the cards you compare, but you are in these ranges) so, they have better performance per Watt. If you plan to use them for long and live in an area where the electricity is expensive, they can be a good long-term investment. [URL="http://www.mersenne.ca/cudalucas.php?sort=gpw"]James' site[/URL] is a perfect starting point. You can sort it by other columns too, by clicking on the column head (:P) [edit: for TF, as I did not see the first time that you asked for TF, the [URL="http://www.mersenne.ca/mfaktc.php?sort=jvr"]link is here[/URL]. Read about the last two columns, very good comparison criteria!] |
[QUOTE=prime7989;341838]Hi Owftheevil,Manpowre
My recomended mod to the threads in invocation of the kernel does not give the right results. It for example says M110503 is composite instead of prime. So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey. However my mod for Fermat numbers can be done in version 2.03 and 2.05 alpha if n is calculated in code as 2^2^n+1 instead of 2^p-1. Does any one know which lines of code that n=2^p-1 is calculated or is it inherent in the cufft calls? Thank you! PS For example 2.03 version runs M61787581 in ETA 60 hrs. will check for a few days if that is prime? Also to be added to parse.c is trial division by small primes of Mp. This can be done simply in parse.c by implementing it in GMP. Also is there any GPU code that does a Pepins' test on Fermat numbers?[/QUOTE] Hi Thanks for this, I had a feeling that the modifications might end up in wrong result, but didnt get to test it. I am just testing the 2.05 alpha, and I cant find it to be quick enough than 2.03, and I compiled sm_20, sm_30, sm_35. (didnt do sm_13 yet). I also see that 2.05 alpha doesnt set the FFT length to be most effective like in 2.03. so I am only running 2.03 at the moment. The only way I can get to the iterations as low as less than 4ms each, is to take memory clock back up to stock, but that doesnt make the Titan stable as I understand the memory on the back side of the card heats up too much. |
[QUOTE=LaurV;341846]
[URL="http://www.mersenne.ca/cudalucas.php?sort=gpw"]James' site[/URL] is a perfect starting point. You can sort it by other columns too, by clicking on the column head (:P) [edit: for TF, as I did not see the first time that you asked for TF, the [URL="http://www.mersenne.ca/mfaktc.php?sort=jvr"]link is here[/URL]. Read about the last two columns, very good comparison criteria!][/QUOTE] So I guess buying a Titan for TF is terrible.. but even for LL testing, is it just worth the grand on my wallet? Also, on James's site I saw that the 7970 GHz beats the 580... is that true? :confused: |
[QUOTE=kracker;341879]I saw that the 7970 GHz beats the 580... is that true? :confused:[/QUOTE][i]Bdot[/i] has recently released mfkato v0.13 with GPU-sieving support so AMD has moved up the rankings accordingly.
|
Indeed. If you look for the results on the bitcoin forums, I always wondered why Radeons did not beat Nvidias at TF, from the very beginning... :razz: They are better at integer math, I mean if you don;t ask them to do much DP calculus, but as soon as someone port the FFT package to OpenCL - there is already one done by Apple, I posted a link somewhere - they may surprise us with LL test's speed too... It hurts me a bit to say that, I am pure Nvidia/cuda guy...
|
[QUOTE=owftheevil;341535]cufftbench only times the ffts. 1 iteration of an LL test consists of 2 ffts, pointwise multiplication, normalization, and splicing. For a rough equivalence of the two timings, pretend iteration times are a multiple of fft times. A more accurate equivalence is iteration time = 2 * fft + k * n for some constant k and fft length n.[/QUOTE]How hard would it be to include a benchmark mode that simulates actual iteration times, not just FFT times?
The FFT benchmark is useful for deciding which FFT size to use for a given exponent, but it's not very useful for helping to predict performance. |
[QUOTE]How hard would it be to include a benchmark mode that simulates actual iteration times, not just FFT times?
The FFT benchmark is useful for deciding which FFT size to use for a given exponent, but it's not very useful for helping to predict performance. [/QUOTE] Not too much trouble. The most time consuming part would be finding reasonable exponents for each of the different fft lengths. This would also be useful for CUDALucas itself. There could be cases where shorter ffts take longer, but yield better iteration times. |
[QUOTE=owftheevil;342234]Not too much trouble. The most time consuming part would be finding reasonable exponents for each of the different fft lengths.[/QUOTE]And if I (or someone) could assist by coming up with such a list, that would help the feature get implemented sooner? :smile:
If you have any suggestions for methodology for determining suitable exponents then I don't mind preparing that list for you. My method would be guess-and-check: try different exponents and see what FFT size is selected, populate my chart, and then try and fill in the gaps by guess-and-checking at what exponent would best fill the in-between FFT sizes. Would that work? |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.