20130925, 17:54  #12 
Sep 2006
The Netherlands
677 Posts 
holy smoke!
Even if i would have the time to get something going at the Tesla's here i wouldn't get to those ranges any quick :) Toying with the alphabet now, especially (un)abcd :) 
20130925, 23:43  #13 
Sep 2006
The Netherlands
1245_{8} Posts 
Moved it from sieving to testing.
Using sllr64 here right now at CPU hardware (Xeon L5420), tested as fastest at the CPU hardware. I remember Jean Penne busy with some gpgpu software, how did that progress lately; has Riesel Prime Search already a public version of that? Got some Tesla's here. They idle now :) Last fiddled with by diep on 20130925 at 23:44 
20130926, 04:04  #14 
"Curtis"
Feb 2005
Riverside, CA
3^{3}·157 Posts 
CUDALLR is available, and in my experience stable. It only uses powerof2 FFT sizes, and speed improves with larger exponents. The main FFT jump we care about is just over 3M for k=69, so your Teslas would be most useful in the upper 2M range, or over 5M (relative to CPU workers, that is).
Check in the hardware/GPU computing forum I didn't see the thread when I glanced, but I've been running the program for over a year, even found a prime for k=5 with it in the 3megabit range. Curtis 
20130927, 01:19  #15  
Sep 2006
The Netherlands
677 Posts 
Quote:
Is that power of 2 the only 'disadvantage' over the IBDWT in SSE2 i got running currently? I tend to remember how my own FFT implementation that also used power of 2 had another few disadvantages (let's say it polite) :) The tesla's i got here are 0.5 Tflop in theory (of course that's always 2x more than it can do in terms of instructions, they always assume you can use multiplyadd, not sure whether this FFT can), looking forward benchmarking it for this code! Note it would be possible at Nvidia to run at each SIMD a different code stream. I don't know whether it still can deliver 0.5 Tflop doing that, yet if it can, should be easier to get rid of that power of 2 sized FFT? Maybe? Last fiddled with by diep on 20130927 at 01:26 

20130927, 02:55  #16 
"Curtis"
Feb 2005
Riverside, CA
3^{3}×157 Posts 
I don't recall what msft (user name, not company) said about the limitations of his code I believe he stopped development shortly after he got it working, in favor of an OpenCL version for the other half of the GPUniverse.
I happen to have plenty of work available near 3M, so I haven't considered alternatives. 
20140105, 15:03  #17 
Sep 2006
The Netherlands
1010100101_{2} Posts 
hi,
I found a prime, maybe some want to verify it is prime. How to properly report it? 69 * 2 ^ 2649939  1 was found prime here! Thanks, Vincent diep@xs4all.nl in case i don't respond quickly at forum. 
20140105, 15:56  #18 
Nov 2003
2×1,811 Posts 
Hi diep
Congratulations! To report it please create a new prover's code including RPS, Psieve, Srsieve and the software you used to prove it prime like LLR. Thanks! 
20140105, 21:11  #19 
Sep 2006
The Netherlands
677 Posts 
Tried all that, let me know if worked out ok. Thanks!
Paul Underwood verified with pfgw and confirms in meantime. 
20140105, 21:24  #20  
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
3·1,543 Posts 
Quote:
http://primes.utm.edu/primes/page.php?id=116841 Last fiddled with by pinhodecarlos on 20140105 at 21:25 

20140105, 22:20  #21 
Sep 2006
The Netherlands
677 Posts 
thanks for verifying!

20140114, 17:21  #22 
Sep 2006
The Netherlands
1245_{8} Posts 
At the L5420 Xeon machines i have here at home, i had seen a pretty big jump in testing time moving up from roughly 2.74Mbit to 2.76 mbit
Testing times increased roughly from 6123 seconds to 7689 seconds. Each CPU has 12 MB L2 cache. So to speak 3MB a core Seems it's the transform causing it, not the hardware. Not sure about transform size internal. If it stores 2.75M bits and assume 18 bits per double then it would require an array sized 2.75mbits * 64 / (18 * 8 bits per byte) = 2.75 * 8 / 18 = 1.2 MB Even double that would easily fit in L2. At what mbit level can i again expect a big dang like that? Is that at double this size at 5.5 Mbit? 