mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   The prime-crunching on dedicated hardware FAQ (https://www.mersenneforum.org/showthread.php?t=10275)

jasonp 2009-06-09 15:29

Per a post elsewhere, [url="http://www.sccg.sk/~vgsem/data/zima2008/FP64_GPU.pdf"]high-end 2009-era GPUs[/url] seem to have significant double-precision potential at a cost slightly higher than a bare-bones PC, so at least the cost hurdle is coming down rapidly.

Robert Holmes 2009-07-26 19:26

For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

[url]http://forums.nvidia.com/index.php?showtopic=102548[/url]

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.

Mini-Geek 2009-07-26 19:43

[quote=Robert Holmes;182846]For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

[URL]http://forums.nvidia.com/index.php?showtopic=102548[/URL]

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.[/quote]
Might it, then, finally be practical to port Prime95 to CUDA?

lfm 2009-07-28 00:15

[QUOTE=Robert Holmes;182846]For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

[url]http://forums.nvidia.com/index.php?showtopic=102548[/url]

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.[/QUOTE]

But not ALL GT200 cards. Only the 260 and up. the 250 and 210 don't work. Be very careful.

hj47 2009-07-29 11:46

[quote=Mini-Geek;182849]Might it, then, finally be practical to port Prime95 to CUDA?[/quote]

This I would LOVE to know.

Robert Holmes 2009-07-29 14:12

It doesn't seem worth the effort --- as said [I]ad nauseam[/I] before, double precision on the GT200 sucks: 1 DP unit per SM.

For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support.

At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with.

EDIT:

After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore?

Robert Holmes 2009-07-29 15:48

[quote=Robert Holmes;183279]It doesn't seem worth the effort --- as said [I]ad nauseam[/I] before, double precision on the GT200 sucks: 1 DP unit per SM.

For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support.

At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with.

EDIT:

After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore?[/quote]

In comparison, a Core i7 950 running last version of prime95 says:

Timing FFTs using 8 threads on 4 physical CPUs:
Best time for 8192K FFT length: 49.532 ms.

Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected.

cheesehead 2009-07-29 17:02

[quote=Robert Holmes;183295]Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected.[/quote]But it's not really necessary for GPUs to be [I]better[/I] than CPUs in order for a port to be worthwhile, is it?

Even if they're now only in the same range of FFT speed, a port to CUDA could eventually double the potential number of processors GIMPS could use -- assuming one GPU per CPU, and eventually most GPUs are as capable as the now-top-of-the-line models.

CADavis 2009-07-29 19:15

This ^

lavalamp 2009-07-29 23:55

Rather than CUDA, which is for nVidia cards only, might it be worthwhile to write the code in OpenCL or similar?

nucleon 2009-09-06 01:25

Good call not to code for the PS3. Latest PS3 released can't run linux, so by my extrapalation has put up significant barriers to run 3rd party code.

-- Craig


All times are UTC. The time now is 21:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.