![]() |
|
|
#111 |
|
Tribal Bullet
Oct 2004
3·1,181 Posts |
Per a post elsewhere, high-end 2009-era GPUs seem to have significant double-precision potential at a cost slightly higher than a bare-bones PC, so at least the cost hurdle is coming down rapidly.
|
|
|
|
|
#112 |
|
Oct 2007
2·53 Posts |
For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:
http://forums.nvidia.com/index.php?showtopic=102548 Only supported in GT200 cards, i.e. GTX260, GTX280, etc. |
|
|
|
|
#113 | |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
17×251 Posts |
Quote:
|
|
|
|
|
|
#114 | |
|
Jul 2006
Calgary
52·17 Posts |
Quote:
|
|
|
|
|
|
#115 |
|
Oct 2008
26 Posts |
|
|
|
|
|
#116 |
|
Oct 2007
2·53 Posts |
It doesn't seem worth the effort --- as said ad nauseam before, double precision on the GT200 sucks: 1 DP unit per SM.
For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support. At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with. EDIT: After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore? Last fiddled with by Robert Holmes on 2009-07-29 at 14:53 |
|
|
|
|
#117 | |
|
Oct 2007
2·53 Posts |
Quote:
Timing FFTs using 8 threads on 4 physical CPUs: Best time for 8192K FFT length: 49.532 ms. Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected. |
|
|
|
|
|
#118 | |
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22·3·641 Posts |
Quote:
Even if they're now only in the same range of FFT speed, a port to CUDA could eventually double the potential number of processors GIMPS could use -- assuming one GPU per CPU, and eventually most GPUs are as capable as the now-top-of-the-line models. |
|
|
|
|
|
#119 |
|
Jul 2005
Des Moines, Iowa, USA
AA16 Posts |
This ^
|
|
|
|
|
#120 |
|
Oct 2007
Manchester, UK
101010011002 Posts |
Rather than CUDA, which is for nVidia cards only, might it be worthwhile to write the code in OpenCL or similar?
|
|
|
|
|
#121 |
|
Mar 2003
Melbourne
5×103 Posts |
Good call not to code for the PS3. Latest PS3 released can't run linux, so by my extrapalation has put up significant barriers to run 3rd party code.
-- Craig |
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New PC dedicated to Mersenne Prime Search | Taiy | Hardware | 12 | 2018-01-02 15:54 |
| The prime-crunching on dedicated hardware FAQ (II) | jasonp | Hardware | 46 | 2016-07-18 16:41 |
| How would you design a CPU/GPU for prime number crunching? | emily | Hardware | 4 | 2012-02-20 18:46 |
| DSP hardware for number crunching? | ixfd64 | Hardware | 15 | 2011-08-09 01:11 |
| Optimal Hardware for Dedicated Crunching Computer | Angular | Hardware | 5 | 2004-01-16 12:37 |