![]() |
|
|
#1 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
10110111110002 Posts |
Admin edit (Max): split off from the main Bigger and better GPU sieving drive: k<10000 n<2M thread. Normally we prefer to keep discussion and reservations in a central location for each drive, but in this case the hoopla surrounding the drive was seriously overwhelming the actual drive content and making it a little hard to sort out.
![]() Why can't we start sieving now for >10T here without a sievefile? There will be no difference to doing it then. Last fiddled with by mdettweiler on 2010-10-06 at 16:08 |
|
|
|
|
|
#2 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
Good question. I'll drop a note in the PST forum asking if we can do this. (Since their reservation thread is the "primary" one, we can't open it up until they put the range on their books.)
|
|
|
|
|
|
#3 |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
426710 Posts |
I don't own a CUDA-ready GPU, so unfortunately I can't efficiently participate in this. Out of curiosity, about how fast (for this or other ppsieve sieves) is a GPU compared to a quad on a 32-bit OS? Does the GPU's speed differ significantly between 32- and 64-bit versions?
Also: Wow, GPUs are really changing the landscape of sieving. That is an incredibly enormous range. It will take quite a while to test, unless a program is released to run LLR tests on a GPU, but it looks like before too long, my n~=1.3M primes, currently ranked ~200, won't be nearly as impressive. Last fiddled with by Mini-Geek on 2010-10-05 at 19:10 |
|
|
|
|
|
#4 | ||
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
Quote:
Quote:
![]() Note that again, this depends on the sieve. For instance, on TPS's variable-n sieve with tpsieve-CUDA, the program is CPU bound on a fast GPU like Gary's; thus 64-bit will run significantly faster than 32-bit. Last fiddled with by mdettweiler on 2010-10-05 at 19:13 |
||
|
|
|
|
|
#5 |
|
May 2007
Kansas; USA
33·5·7·11 Posts |
Tim,
To be more specific. By what Max indicated to me by Email, sieving on a GPU would be likely > 10 times faster than sieving on a 32-bit machine with sr2sieve. BUT...the sieve must be for all k's below a certain limit and must be for base 2. There is no advantage to sieving one or a few k's at a time. Also, it made no sense to only sieve k=300-1001, which is what we originally started doing until Lennart's offer. We could add k<300 for free so we did k<=1001. The speed is based on the highest k in the sieve and P-range and virtually nothing else. At this moment in time, the extreme speed gain that you get from the sieve is very restrictive. Fortunately such a sieve is highly effective for NPLB because of the way we search across large swaths of k. But even if GPU sieving allowed non-base-2 sieving, it still would not be effective at CRUS. We are greatful to Lennart for picking up the effort at PrimeGrid. And to think: It was all spearheaded by Max after I bought my first GPU, which I knew little about. ![]() Gary Last fiddled with by gd_barnes on 2010-10-05 at 23:30 |
|
|
|
|
|
#6 | ||
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
102538 Posts |
Quote:
![]() Quote:
Hopefully, more CUDA sievers (sr2sieve and sr1sieve or equivalents) will be released eventually, and then CRUS could take advantage of GPUs. Last fiddled with by Mini-Geek on 2010-10-06 at 00:33 |
||
|
|
|
|
|
#7 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Quote:
I actually did try ppsieve-CUDA on the Prime Sierpinski Project's sieve a couple of weeks back and it did NOT work well. Firstly, ppsieve actually couldn't allocate a bitmap big enough to hold an n<50M sieve with kmax as high as theirs is. I tried then to split it up into smaller files (5M at a time, IIRC) and while it worked, it was painfully slow--a few orders of magnitude slower than sr2sieve. Gary mentioned to me a little while back that it might be worthwhile to try Riesel or Sierp. base 256 (one of those, I forget which) with ppsieve-CUDA. Its kmax is very low relative to the number of k's remaining, so it might well be in ppsieve's "sweet spot". |
|
|
|
|
|
|
#8 | |
|
Jul 2003
So Cal
2×34×13 Posts |
Quote:
Code:
./ppsieve-cuda-x86_64-linux -R -k3 -K10000 -N2e6 -p10000e9 -P10001e9 -q ppsieve version cuda-0.2.1a (testing) Compiled Oct 5 2010 with GCC 4.3.3 nstart=72, nstep=30 ppsieve initialized: 3 <= k <= 10001, 72 <= n < 2000000 Sieve started: 10000000000000 <= p < 10001000000000 Thread 0 starting Detected GPU 0: GeForce GTX 480 Detected compute capability: 2.0 Detected 15 multiprocessors. nstep changed to 22 p=10000918814721, 5.103M p/sec, 0.07 CPU cores, 91.9% done. ETA 05 Oct 18:27 Thread 0 completed Waiting for threads to exit Sieve complete: 10000000000000 <= p < 10001000000000 Found 3617 factors count=33405006,sum=0x1c1b8d0e01e3e0ea Elapsed time: 196.30 sec. (0.01 init + 196.29 sieve) at 5094920 p/sec. Last fiddled with by frmky on 2010-10-06 at 01:35 |
|
|
|
|
|
|
#9 |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
17·251 Posts |
Wow! :surprised That makes it 55 hours/T (or 2.3 days/T; with GPU) vs 29 days/T (with quad). And that was with an entire quad working on it... Granted, this is an expensive GPU ($450 at a quick glance at Newegg), but that's still a huge difference in speed. Roughly 12.5 times the throughput of a 32-bit quad. The effectiveness for the cost (ignoring electricity, MB, and other overhead) would mean the quad would have to be $36 to compare.
Last fiddled with by Mini-Geek on 2010-10-06 at 02:23 |
|
|
|
|
|
#10 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
141518 Posts |
Quote:
I'll get a range reserved for and started on Gary's GPU sometime today or tomorrow--at that point I'll be able to give some exact figures for p/sec. |
|
|
|
|
|
|
#11 |
|
Jan 2005
Caught in a sieve
5×79 Posts |
From what I can tell, a stock-clocked GTX 460 will only be about half as fast as a stock-clocked GTX 480. It seems to have to do with the 460 needing instruction-level parallelism. The only two good ways I can think of to provide that are to either recompile the client with the latest CUDA SDK or to vectorize the work like I did for AMD. Neither option is all that easy, but compiling the latest SDK on a clean VM is probably easier.
The good news is that you're only at 10T right now. Somewhere between 21T and 40T you'll get a significant speed boost.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New PRPnet drive discussion | mdettweiler | Conjectures 'R Us | 89 | 2011-08-10 09:01 |
| Sieving drive Riesel base 6 n=1M-2M | gd_barnes | Conjectures 'R Us | 40 | 2011-01-22 08:10 |
| Bigger and better GPU sieving drive: k<10000 n<2M | mdettweiler | No Prime Left Behind | 61 | 2010-10-29 18:48 |
| GPU sieving drive for k<=1001 n=1M-2M | mdettweiler | No Prime Left Behind | 11 | 2010-10-04 22:45 |
| Sieving drive for k=301-400 n=1M-2M | MyDogBuster | No Prime Left Behind | 42 | 2010-03-21 01:14 |