![]() |
Per a post elsewhere, [url="http://www.sccg.sk/~vgsem/data/zima2008/FP64_GPU.pdf"]high-end 2009-era GPUs[/url] seem to have significant double-precision potential at a cost slightly higher than a bare-bones PC, so at least the cost hurdle is coming down rapidly.
|
For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:
[url]http://forums.nvidia.com/index.php?showtopic=102548[/url] Only supported in GT200 cards, i.e. GTX260, GTX280, etc. |
[quote=Robert Holmes;182846]For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:
[URL]http://forums.nvidia.com/index.php?showtopic=102548[/URL] Only supported in GT200 cards, i.e. GTX260, GTX280, etc.[/quote] Might it, then, finally be practical to port Prime95 to CUDA? |
[QUOTE=Robert Holmes;182846]For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:
[url]http://forums.nvidia.com/index.php?showtopic=102548[/url] Only supported in GT200 cards, i.e. GTX260, GTX280, etc.[/QUOTE] But not ALL GT200 cards. Only the 260 and up. the 250 and 210 don't work. Be very careful. |
[quote=Mini-Geek;182849]Might it, then, finally be practical to port Prime95 to CUDA?[/quote]
This I would LOVE to know. |
It doesn't seem worth the effort --- as said [I]ad nauseam[/I] before, double precision on the GT200 sucks: 1 DP unit per SM.
For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support. At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with. EDIT: After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore? |
[quote=Robert Holmes;183279]It doesn't seem worth the effort --- as said [I]ad nauseam[/I] before, double precision on the GT200 sucks: 1 DP unit per SM.
For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support. At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with. EDIT: After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore?[/quote] In comparison, a Core i7 950 running last version of prime95 says: Timing FFTs using 8 threads on 4 physical CPUs: Best time for 8192K FFT length: 49.532 ms. Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected. |
[quote=Robert Holmes;183295]Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected.[/quote]But it's not really necessary for GPUs to be [I]better[/I] than CPUs in order for a port to be worthwhile, is it?
Even if they're now only in the same range of FFT speed, a port to CUDA could eventually double the potential number of processors GIMPS could use -- assuming one GPU per CPU, and eventually most GPUs are as capable as the now-top-of-the-line models. |
This ^
|
Rather than CUDA, which is for nVidia cards only, might it be worthwhile to write the code in OpenCL or similar?
|
Good call not to code for the PS3. Latest PS3 released can't run linux, so by my extrapalation has put up significant barriers to run 3rd party code.
-- Craig |
| All times are UTC. The time now is 21:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.