mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-11-15, 23:16   #1
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

33·131 Posts
Exclamation The prime-crunching on dedicated hardware FAQ (II)

Q: What is this?

The original prime-crunching-on-dedicated-hardware-FAQ was written in the middle of 2008, and the state of the art in high-performance graphics cards appears to be advancing at a rate that is exceeding Moore's law. At the same time, the libraries for running non-graphics code on one of those things have advanced to the point where there's a fairly large community of programmers involved, both working and playing with Nvidia cards (see here). Hell, even I'm doing it. So we're going to need a few modifications to the statements made in the original FAQ about where things are going.

Q: So I can look for prime numbers on a GPU now?

Indeed you can. There is an active development effort underway to modify one of the standard Lucas-Lehmer test tools to use the FFT code made available by Nvidia in their cufft library. If you have a latter-day card, i.e. a GTX 260 or better, then you can do double-precision floating point arithmetic in hardware at a rate of 1/8 what the card can do in single precision. Even that card has so much floating point firepower that it can manage respectable performance despite the handicap.

Q: So how fast does it go?

It's a work in progress, but with a top-of-the-line card the current speed seems to be around what one core of a high-end PC can achieve.

Q: That result is not very exciting. What about the next generation of high-end hardware from Nvidia?

The next generation of GPU from Nvidia promises much better double-precision performance (whitepaper here). Fermi will be quite interesting from another viewpoint: 32-bit integer multiplication looks like it will be a very fast operation on that architecture, which makes integer FFTs with respectable performance a possibility.

Q: Does this mean you'll stop being a naysayer on this subject?

If you read the first few followup posts to the original FAQ, and read into my tone in this one, I'm somewhat skeptical that the overall progress of any prime-finding project stands to benefit from a porting of the computational software to a graphics card. Much of the interest in doing so stems from three observations:

- other projects benefit greatly from it, far out of proportion to the number of GPUs that they use

- Most otherwise-idle PCs will also have an otherwise-idle graphics card, so using it would amount to essentially a 'free lunch'

- if a super-powered-by-GPU version of the code existed, buying a super-powered card would make your individual contribution more valuable

In the case of projects like GIMPS, what 'other projects do' is immaterial. It's not that other projects have smart programmers but we don't, it's that their hardware needs are different. Further, GPU code is a free lunch as long as resources are not diverted away from the mainstream project to tap into those resources. As long as somebody volunteers to do so, there's no harm in trying. But in all the years the crunch-on-special-hardware argument has raged, only in the last few months have GPU programming environments stabilized to the point where someone actually stepped forward to do so.

As to your individual contribution, unless you have a big cluster of your own (thousands of machines) to play with, no amount of dedicated hardware is going to change the fact that 1000 strangers running Prime95 in the background are going to contribute more than you can ever hope to. It's not distributed computing if that was otherwise.

So, long story short, I'm still a buzzkill on this subject.

Last fiddled with by jasonp on 2011-01-02 at 19:38 Reason: add link to old FAQ
jasonp is offline   Reply With Quote
Old 2009-11-16, 02:41   #2
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Hi, jasonp

Thank you everything,
msft is offline   Reply With Quote
Old 2009-11-16, 03:44   #3
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

52·17 Posts
Default

Quote:
Originally Posted by jasonp View Post
Q: So I can look for prime numbers on a GPU now?

Indeed you can. There is an active development effort underway to modify one of the standard Lucas-Lehmer test tools to use the FFT code made available by Nvidia in their cufft library. If you have a latter-day card, i.e. a GTX 260 or better, then you can do double-precision floating point arithmetic in hardware at a rate of 1/8 what the card can do in single precision. Even that card has so much floating point firepower that it can manage respectable performance despite the handicap.
Note that the GTX 260M and 280M (for laptops mostly, the M is important) does not have double precision and is NOT supported.

Last fiddled with by lfm on 2009-11-16 at 04:08
lfm is offline   Reply With Quote
Old 2009-11-18, 05:35   #4
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24·32·5 Posts
Default

Quote:
Originally Posted by xkey View Post
I'd really like to see a few 8192 (or bigger) bit registers in the upcoming incarnations from Intel/Amd/IBM. I know Intel is slowly headed there with AVX, but not fast enough for some problems I need solved in a quad or octo chip box.
What kind of problems are those?

I think large registers with complex instructions are a mistake. Top level algorithms can only be chunked into power-of-two sub-problems until you hit register size. Need 17-bit integers? Waste ~50%. Need floats with 12-bit exponent and 12-bit mantissa? Waste ~50%.

Therefore, I'd rather have something really simple, say a 64x64 bit-matrix (with instructions to read/write from rows or columns), an accumulator and maybe two other registers like on the 6502 (now, that was fun programming!), a 4096-bit instruction cache and 2 cycle 8-bit instructions (Load/Store + 8 logical + 8 permute instructions + control flow), so that neighboring units can conflict-free peek/poke each other one clock out of sync.

Then put 16K of these on a single chip, with a couple of F21s sitting on the edges for I/O control.
__HRB__ is offline   Reply With Quote
Old 2010-05-29, 01:12   #5
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

10038 Posts
Default

Quote:
Originally Posted by jasonp View Post
Q: So how fast does it go?

It's a work in progress, but with a top-of-the-line card the current speed seems to be around what one core of a high-end PC can achieve.
Umm this needs to be updated.


Taking some figures on the forum (and my own machine - core i7 930), I get the following measurements:

PS3
2048k fft sec/iter = 0.084
4096k fft sec/iter = 0.194

GTX260
2048k fft sec/iter = 0.0106
4096k fft sec/iter = 0.0218

core i7 930 (single core)
2048k fft sec/iter = 0.0363
4096k fft sec/iter = 0.0799

GTX480 cuda 3.0
2048k fft sec/iter = 0.00547
4096k fft sec/iter = 0.0104

Dual Socket hex-core 3.33GHz
2048k fft sec/iter = 0.00470

GTX480 cuda 3.1
2048k fft sec/iter = 0.00466
4096k fft sec/iter = 0.00937

Top of the line video card appears to be roughly 8times a single core (depending on which video card/cpu combination is compared).

Or you could say, a single GTX480 is similar to the using the full cpu cycles of a dual processor hex core 3.33GHz xeon.

I'm extremely positive this is only going to get better.

-- Craig
nucleon is offline   Reply With Quote
Old 2010-06-01, 19:13   #6
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,869 Posts
Default

Do any of those figures for GPUs use any CPU cycles as well?
henryzz is offline   Reply With Quote
Old 2010-06-01, 22:58   #7
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

23·32·29 Posts
Default

Very little. About 2%.
frmky is offline   Reply With Quote
Old 2010-06-10, 02:29   #8
Commaster
 
Jun 2010
Kiev, Ukraine

3·19 Posts
Default AMD Radeon GPU

Speaking of GPU crunching... Anybody forgot AMD?
According to this, new cards do support the required DP-FP.
Commaster is offline   Reply With Quote
Old 2010-08-09, 20:15   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

101101011010112 Posts
Default

I am slated to get a new laptop at work by year`s end ... it would be cool if it offered the possibility of doing some GPU code-dev work on the side. The 2 different GPUs on offer are

512 MB NVidia NVS 3100M

512MB NVidia Quadro FX 1800M

Note that latter is only available in the "ultra high performance" notebook, which requires justification and manager approval.

Here are the minimal requirements for me to spend any nontrivial time reading-docs/installing-shit/playing-with-code, along with some questions:

1. The software development environment (SWDE) needs to be somewhat portable, in the sense that I don't want to write a whole bunch of GPU-oriented code which then only works on one model or family of GPUs.

Q: Does this mean OpenCL?

2. All systems run Windows v7 professional edition. Is OpenCL available here? If so, is it integrated with Visual Studio or a linux-emulation environment (e.g. Wine), what?

3. The SWDE must support double-precision floating-point arithmetic, even if the GPU hardware does not. If DP support is via emulation, I need to have reasonable assurance that the timings of code run this way at least correlate decently well with true GPU-hardware-double performance.

4. It is preferred that the GPU have DP support - how do the above 2 GPUs compare in this regard?

And yes, I *know* "all this information is available" out there on the web somewhere, but I've found no convenient FAQ-style page on the nVidia website which answers more than one of the above questions, so I figured I'd ask the experts. I simply do not have time to read multiple hundred-page PDFs in order to try to glean answers to a few basic questions.

Thanks in advance,
-Ernst
ewmayer is offline   Reply With Quote
Old 2010-08-09, 22:27   #10
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

23·32·29 Posts
Default

Quote:
Originally Posted by ewmayer View Post
512 MB NVidia NVS 3100M
512MB NVidia Quadro FX 1800M
According to the CUDA programmer's guide, Appendix A, these are both Compute Capability 1.2 devices, which per Appendix G means they do not support double precision. To address your questions...

1. Non-DP CUDA code will work on any current nVidia device. The goal of OpenCL code is that the same code can be compiled for multiple devices, but for now it usually will require some tweaking.

2. nVidia releases SDKs for both Windows (Visual Studio) and Linux.

3. All DP calculations are demoted to SP for Compute 1.2 devices and below. New to version 3 of the CUDA toolkit is the elimination of DP emulation in software. This is a good thing as it wasn't very reliable anyway and it did not give realistic timings.

4. As above, no and no. Neither card, nor any of their mobile cards for that matter, can be used to develop DP code. For (relatively) inexpensive DP code development, pick up a GTX 460 or GTX 465. The GTX 460 is faster for DP code and less expensive, but generates more heat. The GTX 465 is faster for SP code.
frmky is offline   Reply With Quote
Old 2010-08-09, 23:09   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

1162710 Posts
Default

Quote:
Originally Posted by frmky View Post
4. As above, no and no. Neither card, nor any of their mobile cards for that matter, can be used to develop DP code. For (relatively) inexpensive DP code development, pick up a GTX 460 or GTX 465. The GTX 460 is faster for DP code and less expensive, but generates more heat. The GTX 465 is faster for SP code.
Many thanks, frmky. I've been told that one needs a full-sized case system with sufficient-wattage power supply to run one of these high-end cards, but I wonder: can one also get one in a standalone external format? I have 2 laptops at home: A 3-year-old Thinkpad running XP with VS2005 installed and a 1-year-old MacBook running Linux 4.2 ... I would be happy to get one of the above cards in an external add-on format, but really don't fancy the idea of buying a full-sized PC system anymore ... trying to keep the amount of compute hardware in my home restricted to a small footprint.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New PC dedicated to Mersenne Prime Search Taiy Hardware 12 2018-01-02 15:54
How would you design a CPU/GPU for prime number crunching? emily Hardware 4 2012-02-20 18:46
DSP hardware for number crunching? ixfd64 Hardware 15 2011-08-09 01:11
The prime-crunching on dedicated hardware FAQ jasonp Hardware 142 2009-11-15 23:20
Optimal Hardware for Dedicated Crunching Computer Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 03:48.

Fri May 7 03:48:48 UTC 2021 up 28 days, 22:29, 0 users, load averages: 1.12, 1.30, 1.71

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.