20120305, 04:31  #100 
"Jerry"
Nov 2011
Vancouver, WA
463_{16} Posts 

20120305, 05:16  #101 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3·29·83 Posts 
Why does Prime95 keep roundoff error below .24 if CL and gL both seem to work fine with more than that? (Why not go up to like .45 or something?)

20120305, 05:47  #102 
P90 years forever!
Aug 2002
Yeehaw, FL
1CEF_{16} Posts 
If the average max roundoff error for 1000 iterations is above 0.24 then it is not uncommon for one or more of the tens of millions of iterations to come out above 0.4  which is getting into dangerous territory.

20120305, 06:45  #103 
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 89<O<88
3·29·83 Posts 
Ah, okay, and that's what produces the "reproducible error" message?

20120305, 20:48  #104 
P90 years forever!
Aug 2002
Yeehaw, FL
3^{2}×823 Posts 
I have some ideas for speeding up the carry propagation step. Before proceeding, can anyone tell me what percentage of the time is spent doing the various steps: forward FFT, pointwise squaring, inverse FFT, carry propagation (which includes applying weights and converting to and from integer).
I can describe the ideas to anyone who wants to run with it, or else you'll have to wait for me to install an nVidia environment and learning how to use it. 
20120305, 21:37  #105 
Bemusing Prompter
"Danny"
Dec 2002
California
2^{3}×3^{3}×11 Posts 
Hmm, is George hinting that GPU support will be his next project?

20120306, 03:28  #106  
Jan 2011
Dudley, MA, USA
73 Posts 
Quote:
Code:
Environment: GTX 460, Cuda3.2, driver 295.20, x86_64 testPrime 216091, signalSize 12288 (profiled approx 5000 iterations) Kernel % of Total GPU Time CUFFT (Both directions) 77.08% llintToIrrBal<3> 8.91% loadIntToDoubleIBDWT 5.44% invDWTproductMinus2 4.92% ComplexPointwiseSqr 3.65% testPrime 26199377, signalSize 1474560, (profiled approx 9000 iterations) CUFFT (Both directions) 70.79% llintToIrrBal<3> 8.53% loadIntToDoubleIBDWT 7.77% invDWTproductMinus2 7.71% ComplexPointwiseSqr 5.20% I haven't really looked much at optimising the kernels themselves, yet, but am definitely open to ideas. All of the kernels are memorybound, and the little amount of work done in each of the three small kernels (loadIntToDoubleIBDWT, invDWTproductMinus2, and ComplexPointwiseSqr) bothers me: there is likely too much overhead lost to launching the kernels, compared to the amount of work done. 

20120306, 03:55  #107 
Apr 2010
3^{3} Posts 
Hi, I would like to try this against my GTX295. I'm running Windows 7 64 bit.
Can someone please post a compiled version of this program for my platform. Please also include what version of CUDA I have to install to run it. Thanks. 
20120306, 04:20  #108  
P90 years forever!
Aug 2002
Yeehaw, FL
3^{2}·823 Posts 
Quote:


20120315, 03:19  #109  
Jan 2011
Dudley, MA, USA
73 Posts 
Quote:
M( 26171441 )C, 0x449e471e42bfe489, n = 1474560, gpuLucas v0.9.3 Estimated runtime was overenthusiastic. Runtime was about even with CUDALucas. The estimate calculation has been adjusted slightly to compensate. As I mentioned, the kernels are *very* memory throughput bound, but I believe I may have already found a few methods to ease some of this strain... need to do more tests. 

20140728, 22:14  #110 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada
41 Posts 
so I'm wondering where do I get gpuLucas to test it out? Here? https://github.com/Almajester/gpuLucas ? I'm currently using multiple instances of CUDALucas v2.05Beta on both gpus of a GTX 295. I'm able to run a single instance of MFaktC v0.20 simultaneously but that's all, just 3 instances of all these apps together max. I'm looking to try out gpuLucas and compare its performance with CL but is there a version I can download and test without having to compile it myself? I've never used this makefile compiling stuff before and really not confident in how to use it properly, I'm a windows user.
Last fiddled with by GhettoChild on 20140728 at 22:15 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfaktc: a CUDA program for Mersenne prefactoring  TheJudger  GPU Computing  3492  20210324 14:09 
Do normal adults give themselves an allowance? (...to fast or not to fast  there is no question!)  jasong  jasong  35  20161211 00:57 
Find Mersenne Primes twice as fast?  Derived  Number Theory Discussion Group  24  20160908 11:45 
TPSieve CUDA Testing Thread  Ken_g6  Twin Prime Search  52  20110116 16:09 
Fast calculations modulo small mersenne primes like M61  Dresdenboy  Programming  10  20040229 17:27 