mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-29, 03:15   #1915
prime7989
 
Jun 2012

17 Posts
Post LL on Mersenne, Fermat numbers and Pepin's test

Hi Owftheevil,Manpowre
My recomended mod to the threads in invocation of the kernel does not give the right results. It for example says M110503 is composite instead of prime. So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey.
However my mod for Fermat numbers can be done in version 2.03 and 2.05 alpha if
n is calculated in code as 2^2^n+1 instead of 2^p-1. Does any one know which lines of code that n=2^p-1 is calculated or is it inherent in the cufft calls?
Thank you!
PS For example 2.03 version runs M61787581 in ETA 60 hrs.
will check for a few days if that is prime?
Also to be added to parse.c is trial division by small primes of Mp.
This can be done simply in parse.c by implementing it in GMP.
Also is there any GPU code that does a Pepins' test
on Fermat numbers?
prime7989 is offline   Reply With Quote
Old 2013-05-29, 03:29   #1916
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Well, I certainly understand R.D. Silverman better now.
owftheevil is offline   Reply With Quote
Old 2013-05-29, 03:46   #1917
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·32·59 Posts
Default

Quote:
Originally Posted by prime7989 View Post
Hi Owftheevil,Manpowre
... So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey...
Note that Owftheevil does understand the code and has made significant modifications to msft's original code.
frmky is online now   Reply With Quote
Old 2013-05-29, 06:42   #1918
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

100101101111112 Posts
Default

Quote:
Originally Posted by kracker View Post
Never mind I can't "officially" buy a 570 or 580 I think...

Just for my curiosity, how many Ghzdays(Yes, TF..wrong thread?) do you/can you get on at 680 per $?
I don't own 6xx cards, but from former testing and from James' page, their performance is lousy, compared with 5xx. I think you are hitting to GHzD per Watt, and not per dollar. Well, at the end it comes to dollars too, but this is when you plan to use them for long time, or live in an area where the electricity is expensive. The best part of Keplers (6xx) is the power management (Keplers can reduce the power and consume very little, compared to Fermis, like they have half the DP performance, or less, but consume a third of power, or a quarter - it highly depends of the cards you compare, but you are in these ranges) so, they have better performance per Watt. If you plan to use them for long and live in an area where the electricity is expensive, they can be a good long-term investment.

James' site is a perfect starting point. You can sort it by other columns too, by clicking on the column head (:P)
[edit: for TF, as I did not see the first time that you asked for TF, the link is here. Read about the last two columns, very good comparison criteria!]

Last fiddled with by LaurV on 2013-05-29 at 06:45
LaurV is offline   Reply With Quote
Old 2013-05-29, 08:37   #1919
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default

Quote:
Originally Posted by prime7989 View Post
Hi Owftheevil,Manpowre
My recomended mod to the threads in invocation of the kernel does not give the right results. It for example says M110503 is composite instead of prime. So i would suggest leaving that at 128 until someone who understands the cudalucas 2.05Alpha code comes along and joins in the frey.
However my mod for Fermat numbers can be done in version 2.03 and 2.05 alpha if
n is calculated in code as 2^2^n+1 instead of 2^p-1. Does any one know which lines of code that n=2^p-1 is calculated or is it inherent in the cufft calls?
Thank you!
PS For example 2.03 version runs M61787581 in ETA 60 hrs.
will check for a few days if that is prime?
Also to be added to parse.c is trial division by small primes of Mp.
This can be done simply in parse.c by implementing it in GMP.
Also is there any GPU code that does a Pepins' test
on Fermat numbers?
Hi Thanks for this, I had a feeling that the modifications might end up in wrong result, but didnt get to test it. I am just testing the 2.05 alpha, and I cant find it to be quick enough than 2.03, and I compiled sm_20, sm_30, sm_35. (didnt do sm_13 yet). I also see that 2.05 alpha doesnt set the FFT length to be most effective like in 2.03. so I am only running 2.03 at the moment.

The only way I can get to the iterations as low as less than 4ms each, is to take memory clock back up to stock, but that doesnt make the Titan stable as I understand the memory on the back side of the card heats up too much.
Manpowre is offline   Reply With Quote
Old 2013-05-29, 14:21   #1920
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

Quote:
Originally Posted by LaurV View Post
James' site is a perfect starting point. You can sort it by other columns too, by clicking on the column head (:P)
[edit: for TF, as I did not see the first time that you asked for TF, the link is here. Read about the last two columns, very good comparison criteria!]
So I guess buying a Titan for TF is terrible.. but even for LL testing, is it just worth the grand on my wallet?

Also, on James's site I saw that the 7970 GHz beats the 580... is that true?
kracker is online now   Reply With Quote
Old 2013-05-29, 15:03   #1921
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

D6316 Posts
Default

Quote:
Originally Posted by kracker View Post
I saw that the 7970 GHz beats the 580... is that true?
Bdot has recently released mfkato v0.13 with GPU-sieving support so AMD has moved up the rankings accordingly.
James Heinrich is offline   Reply With Quote
Old 2013-05-29, 15:27   #1922
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·3,221 Posts
Default

Indeed. If you look for the results on the bitcoin forums, I always wondered why Radeons did not beat Nvidias at TF, from the very beginning... They are better at integer math, I mean if you don;t ask them to do much DP calculus, but as soon as someone port the FFT package to OpenCL - there is already one done by Apple, I posted a link somewhere - they may surprise us with LL test's speed too... It hurts me a bit to say that, I am pure Nvidia/cuda guy...
LaurV is offline   Reply With Quote
Old 2013-06-01, 10:59   #1923
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23·149 Posts
Default

Quote:
Originally Posted by owftheevil View Post
cufftbench only times the ffts. 1 iteration of an LL test consists of 2 ffts, pointwise multiplication, normalization, and splicing. For a rough equivalence of the two timings, pretend iteration times are a multiple of fft times. A more accurate equivalence is iteration time = 2 * fft + k * n for some constant k and fft length n.
How hard would it be to include a benchmark mode that simulates actual iteration times, not just FFT times?
The FFT benchmark is useful for deciding which FFT size to use for a given exponent, but it's not very useful for helping to predict performance.
James Heinrich is offline   Reply With Quote
Old 2013-06-01, 13:19   #1924
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Quote:
How hard would it be to include a benchmark mode that simulates actual iteration times, not just FFT times?
The FFT benchmark is useful for deciding which FFT size to use for a given exponent, but it's not very useful for helping to predict performance.
Not too much trouble. The most time consuming part would be finding reasonable exponents for each of the different fft lengths.

This would also be useful for CUDALucas itself. There could be cases where shorter ffts take longer, but yield better iteration times.
owftheevil is offline   Reply With Quote
Old 2013-06-01, 14:08   #1925
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65438 Posts
Default

Quote:
Originally Posted by owftheevil View Post
Not too much trouble. The most time consuming part would be finding reasonable exponents for each of the different fft lengths.
And if I (or someone) could assist by coming up with such a list, that would help the feature get implemented sooner?
If you have any suggestions for methodology for determining suitable exponents then I don't mind preparing that list for you. My method would be guess-and-check: try different exponents and see what FFT size is selected, populate my chart, and then try and fill in the gaps by guess-and-checking at what exponent would best fill the in-between FFT sizes. Would that work?
James Heinrich is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:56.


Fri Aug 6 14:56:09 UTC 2021 up 14 days, 9:25, 1 user, load averages: 2.39, 2.78, 2.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.