mersenneforum.org The prime-crunching on dedicated hardware FAQ
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2008-05-16, 10:59 #2 davieddy     "Lucan" Dec 2006 England 645110 Posts Fixed Point v Floating Point My understanding of the "Fast" in "Fast Fourier Transform" is that it replaced N^2 multiply/accumulates with N multiplies and NlogN additions. That being the case, isn't fixed point potentially more efficient (gate and energy consumption wise)? David
 2008-05-16, 11:16 #3 jasonp Tribal Bullet     Oct 2004 348910 Posts In the general case, you are replacing a dense matrix multiply (N^2 complex multiply-add operations) with a series of recursive sparse matrix multiplies. The decomposition at size N involves N complex multiply-adds and two dense multiplies of size N/2, which are then decomposed recursively in the same way. For N = 2^k the recursion happens k times, so that the final operation count is O(k*2^k) = O(n*log(n)). There are multiplies at all levels of the recursion, many of which are trivial and removed in an optimized implementation. A fixed-point hardware implementation has all of the same work to do as a floating point implementation, with the added burden of error analysis sufficient to guarantee that the choice of fixed-point mantissa is large enough to represent the answer correctly. My guess is that the kind of dynamic range (the range of values in transform space that a big multiply generates) is so large that a fixed-point implementation would actually need more logic than a floating point implementation.
 2008-05-16, 11:31 #4 davieddy     "Lucan" Dec 2006 England 6,451 Posts THX for the prompt response. At first glance you seem to be saying that floating v fixed was more clear cut than VHS v Betamax. Thanks also for this comprehensive FAQ David
2008-05-17, 11:41   #5
davieddy

"Lucan"
Dec 2006
England

6,451 Posts

Quote:
 Originally Posted by jasonp A fixed-point hardware implementation has all of the same work to do as a floating point implementation, with the added burden of error analysis sufficient to guarantee that the choice of fixed-point mantissa is large enough to represent the answer correctly. My guess is that the kind of dynamic range (the range of values in transform space that a big multiply generates) is so large that a fixed-point implementation would actually need more logic than a floating point implementation.
I would like to risk disputing some of this.

One thing fixed point addition doesn't have to do is interpret the
floating point format and shift things appropriately.
Error analysis is not difficult. Determine (empirically if
all else fails) how many binary places of precision are needed
to prevent the random walk of roundoff errors accumulating to
slip an integer in the final result, and how many bits are needed
to prevent overflow. No need to check while the test is in progress:
just trust the statistics and tolerate the risk of fatal error you opt for.

I would be interested to know how 64 bits of fixed point could perform.
(How many input bits can each element take?
Where is the "fixed binary point"?).

David

PS Isn't "fixed point mantissa" self contradictory?

Last fiddled with by davieddy on 2008-05-17 at 12:23

2008-05-17, 14:30   #6
jasonp
Tribal Bullet

Oct 2004

3×1,163 Posts

Quote:
 Originally Posted by davieddy One thing fixed point addition doesn't have to do is interpret the floating point format and shift things appropriately. Error analysis is not difficult. Determine (empirically if all else fails) how many binary places of precision are needed to prevent the random walk of roundoff errors accumulating to slip an integer in the final result, and how many bits are needed to prevent overflow. No need to check while the test is in progress: just trust the statistics and tolerate the risk of fatal error you opt for. ... PS Isn't "fixed point mantissa" self contradictory?
Yes, fixed point would just involve an accumulator that is shifted a fixed number of places after an arithmetic operation. An upper bound on the dynamic range is easy; the first row of the DFT matrix is all +1, and the other matrix entries are all values in [-1,1], so the largest absolute value you can expect in transformed space is the sum of the absolute values of the signal elements, which would then be squared during the pointwise convolution.

What's not clear to me is how you can determine the number of extra bits in a register that are needed to avoid catastrophic additive cancellation, where quantities that are almost equal would get flushed to zero when subtracted in a fixed point implementation, and then go on to propagate large errors in all the places those flushed results are used. Floating point is much more resistant this problem, you get full accuracy in the mantissa a higher portion of the time. There are a few papers available on fixed point FFT hardware, but I find that many of their intended applications are not very demanding, and the designers opt for fixed point as a cost-saving measure.

 2008-05-17, 17:24 #7 RMAC9.5     Jun 2003 32·17 Posts Jasonp, The frustration that some of us feel when we are told that GPUs or special hardware won't work because they don't support DP floating point mathematical operations is that some of these product developers claim that their products do support DP mathematical operations. Clearspeed is a good example. Clicking on the link you provided takes me to a web page where DP floating point support is listed as a feature. Recent ATI graphics cards (ATI 3xxx & higher) also claim DP floating point support. GIMPS is my main hobby. I have a small PC farm at my house. Ten window PCs ranging from an AMD K6-III to an AMD64 X2 6400+. All of these PCs have AGP or better graphics cards in them. If Prime95 or one of its cousins supported an ATI 3850 or similar graphic card with good performance, I could inexpensively more than double my current GIMPS contribution.
2008-05-17, 17:29   #8
davieddy

"Lucan"
Dec 2006
England

6,451 Posts

Quote:
 Originally Posted by jasonp Yes, fixed point would just involve an accumulator that is shifted a fixed number of places after an arithmetic operation.
I'm sure our posting style will converge eventually.
Meantime I would say that the "fixed point" is
located in the programmer's brain rather than the hardware.

Last fiddled with by davieddy on 2008-05-17 at 17:32

2008-05-17, 19:11   #9
jasonp
Tribal Bullet

Oct 2004

3·1,163 Posts

Quote:
 Originally Posted by RMAC9.5 Jasonp, The frustration that some of us feel when we are told that GPUs or special hardware won't work because they don't support DP floating point mathematical operations is that some of these product developers claim that their products do support DP mathematical operations. Clearspeed is a good example. Clicking on the link you provided takes me to a web page where DP floating point support is listed as a feature. Recent ATI graphics cards (ATI 3xxx & higher) also claim DP floating point support. GIMPS is my main hobby. I have a small PC farm at my house. Ten window PCs ranging from an AMD K6-III to an AMD64 X2 6400+. All of these PCs have AGP or better graphics cards in them. If Prime95 or one of its cousins supported an ATI 3850 or similar graphic card with good performance, I could inexpensively more than double my current GIMPS contribution.
I've added a little paragraph to question 3 to address this more fully. The difference between Clearspeed's board and the claims of graphics chip manufacturers is that Clearspeed supports DP in hardware as an atomic operation, whereas if modern graphics cards support DP at all then it is in emulated format, like Cell does (I'd love to be proven wrong on this). Believe it or not, your high-end graphics card cannot support professional 3D rendering either, because state-of-the-art raytracing code like POVRAY requires double precision for stuff like radiosity calculations (this is in the POVRAY FAQ)

Regarding your farm, 1) how many of the graphics cards in your farm are programmable at all? 2) would you be willing to part with X dollars per node for high-end graphics cards that are capable of executing double-precision floating point natively? 3) how many of your friends would do the same? (Clearspeed's boards were $10k each, last time someone asked them) 4) if the goal is to get the most LL tests done across all users, is improving your personal farm's throughput the best use of man-years of George Woltman's time?  2008-05-18, 05:50 #10 RMAC9.5 Jun 2003 32×17 Posts Jasonp, Pardon my not quoting the text below that I copied from your answer to my post. I don't post very often and haven't figured out how the quote feature works. I am an applications programmer and understand most of the concepts here but I don't have any experience programming in the PC world or in the detailed hardware world that George, for example, programs in. "Regarding your farm, 1) how many of the graphics cards in your farm are programmable at all? 2) would you be willing to part with X dollars per node for high-end graphics cards that are capable of executing double-precision floating point natively? 3) how many of your friends would do the same? (Clearspeed's boards were$10k each, last time someone asked them) 4) if the goal is to get the most LL tests done across all users, is improving your personal farm's throughput the best use of man-years of George Woltman's time?" 1) Answer - Currently, I only have one graphics card that might be programmable. It is an ATI 1900 PCI-e card with 48 vertex and pixel shaders (using Shader model 3.0) and 256MB of on card memory. AMD claims that it supports 128 bit floating point for all shader operations and 64 bit floating point for all HDR rendering operations but the Shader model 3.0 programming limitations probably make it a poor choice to program for. 2) Answer - It depends on how big X is and how much performance would be gained. For example, my Intel PII 400 MHz PC running Windows 98 factors a M48 size number in approximately 21 days. My AMD Opteron 175 PC running Windows XP 64 factors two M48 size numbers every 29 hours. If adding a Sapphire Radeon HD 3850 512 MB AGP graphics card ($160 -$20 mail in rebate + $7 S/H =$147 on Newegg.com) and a new power supply to my Intel PII 400 PC would make it perform like my AMD Opteron, I would definitely be interested. 3) Answer - I think that many, many people who are still running older hardware for GIMPS would be interested in upgrading their graphics cards if they could contribute to GIMPS by doing so. (Clearspeed, by the way, is currently running a $4995 special but like most people,$4995 is still much too rich for me.) 4) Answer - I don't think you understood the purpose of my previous post. I am not a gamer and normally purchase inexpensive graphics cards or use the built in motherboard graphics chips. If I could inexpensively and greatly increase my GIMPS throughput without increasing the number of PCs in my house, I would do it in a heart beat. I only mentioned the ATI Radeon HD 3850 graphics card in my previous post because it comes in both PCI-e and AGP flavors. It also, according to AMD, has 320 stream processing units which support 128 bit floating point precision for all vertex, geometry, and pixel shader operations. Here is a link to its GPU specifications page http://ati.amd.com/products/radeonhd3800/specs.htm Last fiddled with by WraithX on 2016-02-15 at 05:14
2008-05-18, 08:22   #11
sylvester

Dec 2007
RSM, CA, USA

3×13 Posts

Quote:
 Originally Posted by RMAC9.5 I am an applications programmer and understand most of the concepts here... ...processing units which support 128 bit floating point precision...
I'm a nobody in this project, so I can be more direct than the people who are really involved in and want to be polite to all contributors.

RMAC9.5 you are just a crank programmer. In the Math forum there is/was/was proposed a subforum for crank mathematicians who think they understand the math. JasonP is trying to be very polite and not call you names, but you really pushing it.

128-bit floating point on a GPU is simply 4*32-bit single precision floating point representation of a geometric point using homogenous coordinates [x,y,z,w]. This has nothing to do with double-precision (64-bit) or quad-precision (true 128-bit).

If you didn't write that you are a "programmer and understand" I would just call you confused by the marketing collateral. Put you call yourself a programmer and have no idea of how basic data types are represented in a processor. This puts you in the category of cranks. I just hope that you aren't programming "applications" for my health insurance or my bank. <Deity> help us!

Go to Dr.Crandall's web site and download "lucdwt.c". Go to ATI web site and download the OpenGPU specifications. Read both. Do something with them. But please stop harranguing people here with your crank programming ideas.

To the forum moderator: Don't tase me, bro!

 Similar Threads Thread Thread Starter Forum Replies Last Post Taiy Hardware 12 2018-01-02 15:54 jasonp Hardware 46 2016-07-18 16:41 emily Hardware 4 2012-02-20 18:46 ixfd64 Hardware 15 2011-08-09 01:11 Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 20:52.

Wed Oct 21 20:52:35 UTC 2020 up 41 days, 18:03, 0 users, load averages: 0.99, 1.61, 1.73