mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Closed Thread
 
Thread Tools
Old 2009-06-09, 15:29   #111
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,181 Posts
Default

Per a post elsewhere, high-end 2009-era GPUs seem to have significant double-precision potential at a cost slightly higher than a bare-bones PC, so at least the cost hurdle is coming down rapidly.
jasonp is offline  
Old 2009-07-26, 19:26   #112
Robert Holmes
 
Robert Holmes's Avatar
 
Oct 2007

2×53 Posts
Default

For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

http://forums.nvidia.com/index.php?showtopic=102548

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.
Robert Holmes is offline  
Old 2009-07-26, 19:43   #113
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Quote:
Originally Posted by Robert Holmes View Post
For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

http://forums.nvidia.com/index.php?showtopic=102548

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.
Might it, then, finally be practical to port Prime95 to CUDA?
Mini-Geek is offline  
Old 2009-07-28, 00:15   #114
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52·17 Posts
Default

Quote:
Originally Posted by Robert Holmes View Post
For the record, the new version of the CUDA toolkit's FFT library now sports a double precision FFT implementation:

http://forums.nvidia.com/index.php?showtopic=102548

Only supported in GT200 cards, i.e. GTX260, GTX280, etc.
But not ALL GT200 cards. Only the 260 and up. the 250 and 210 don't work. Be very careful.
lfm is offline  
Old 2009-07-29, 11:46   #115
hj47
 
hj47's Avatar
 
Oct 2008

26 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Might it, then, finally be practical to port Prime95 to CUDA?
This I would LOVE to know.
hj47 is offline  
Old 2009-07-29, 14:12   #116
Robert Holmes
 
Robert Holmes's Avatar
 
Oct 2007

1528 Posts
Default

It doesn't seem worth the effort --- as said ad nauseam before, double precision on the GT200 sucks: 1 DP unit per SM.

For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support.

At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with.

EDIT:

After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore?

Last fiddled with by Robert Holmes on 2009-07-29 at 14:53
Robert Holmes is offline  
Old 2009-07-29, 15:48   #117
Robert Holmes
 
Robert Holmes's Avatar
 
Oct 2007

2×53 Posts
Default

Quote:
Originally Posted by Robert Holmes View Post
It doesn't seem worth the effort --- as said ad nauseam before, double precision on the GT200 sucks: 1 DP unit per SM.

For a top of the line GTX285 card, this gives us (1476*10^6 * 2 * 30) / 10^9 = 88 GFlops, assuming all operations can be turned into MADs. More realistically it will be closer to 40-50, which is what a regular CPU can do. Maybe the GT300 will have proper DP support.

At any rate, a proof of concept wouldn't hurt. I'd be willing to waste some time trying during August, if there was simple enough pseudo-code to begin with.

EDIT:

After a simple experiment, doing a double precision 2^24 Complex to Complex FFT using CUFFT in a GTX260 (theoretical 60 GFlops) takes about 0.0913 seconds, memory transfers excluded. Let's approximate the number of FP operations in the FFT as 5*n log n, i.e. 5 * 2^24 * 24 --- This gives us ((5 * 2^24 * 24) / 0.09) / 10^9 ~ 22 GFlops of FFT. How does this compare to a decent current quadcore?
In comparison, a Core i7 950 running last version of prime95 says:

Timing FFTs using 8 threads on 4 physical CPUs:
Best time for 8192K FFT length: 49.532 ms.

Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected.
Robert Holmes is offline  
Old 2009-07-29, 17:02   #118
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Quote:
Originally Posted by Robert Holmes View Post
Assuming the linear scaling, current GPUs are no better than CPUs at this, as pretty much everyone expected.
But it's not really necessary for GPUs to be better than CPUs in order for a port to be worthwhile, is it?

Even if they're now only in the same range of FFT speed, a port to CUDA could eventually double the potential number of processors GIMPS could use -- assuming one GPU per CPU, and eventually most GPUs are as capable as the now-top-of-the-line models.
cheesehead is offline  
Old 2009-07-29, 19:15   #119
CADavis
 
CADavis's Avatar
 
Jul 2005
Des Moines, Iowa, USA

2528 Posts
Default

This ^
CADavis is offline  
Old 2009-07-29, 23:55   #120
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

22·3·113 Posts
Default

Rather than CUDA, which is for nVidia cards only, might it be worthwhile to write the code in OpenCL or similar?
lavalamp is offline  
Old 2009-09-06, 01:25   #121
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

Good call not to code for the PS3. Latest PS3 released can't run linux, so by my extrapalation has put up significant barriers to run 3rd party code.

-- Craig
nucleon is offline  
Closed Thread



Similar Threads
Thread Thread Starter Forum Replies Last Post
New PC dedicated to Mersenne Prime Search Taiy Hardware 12 2018-01-02 15:54
The prime-crunching on dedicated hardware FAQ (II) jasonp Hardware 46 2016-07-18 16:41
How would you design a CPU/GPU for prime number crunching? emily Hardware 4 2012-02-20 18:46
DSP hardware for number crunching? ixfd64 Hardware 15 2011-08-09 01:11
Optimal Hardware for Dedicated Crunching Computer Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 21:16.


Sun Aug 1 21:16:00 UTC 2021 up 9 days, 15:44, 0 users, load averages: 1.89, 1.75, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.