20170802, 17:40  #1 
Bemusing Prompter
"Danny"
Dec 2002
California
2·3·397 Posts 
does halfprecision have any use for GIMPS?
I noticed that the newer GPUs now support halfprecision. Can FP16 be used for trial factoring? Or does it have to be at least singleprecision?
Last fiddled with by ixfd64 on 20201017 at 00:28 Reason: wrong word in title 
20170802, 18:27  #2 
"/X\(‘‘)/X\"
Jan 2013
3·977 Posts 
Not really. Sure, you can double the FP operations to do the same work, but doing so would mean many more multiplying operations.
FP64 would be a benefit as there would be fewer multiplications required, but FP64 is crippled on consumer cards. With the latest generations doing FP16, I was hoping for FP64 silicon that is split into FP32/16 when needed, but apparently it's still FP32/16 silicon with a FP64 unit on the side. 
20170803, 11:30  #3 
Feb 2016
UK
419 Posts 
Correct me if I'm wrong but:
FP64 = double precision and is what we historically need, but not commonly offered any more at any decent performance level FP32 = standard precision and is what is mostly offered FP16 = half precision, more common now with so called deep learning stuff, double FP32 if supported. I know it isn't that simple, but is it possible to use multiple FP32 operations to give the same result? How much overhead is expected over a native FP64 implementation? I take it you can't do two FP32 operations to replace a single FP64 operation... Side comment: anyone looked at Project 47 from AMD? They're selling it as a petaflop in a rack, but that is for FP32 with 1/16 FP64 rate. I had to burst some fanboy bubbles on another forum by pointing that out. 
20170803, 12:32  #4 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,143 Posts 
Yes, but ... you'd need at least four FP32 multiplies to give a double length result. But even then you only get 2 x 23 bits of precision, still short of the 52 bits of precision of a single native FP64 multiply.

20170803, 12:33  #5 
Romulan Interpreter
Jun 2011
Thailand
9452_{10} Posts 
You will need like 7.5 or 8.5 FP32 to do one FP64, and you have a little spare. There is a discussion somewhere here around. Therefore any hardware that has less than 1:8 DP:SP fraction, is not interesting from the DP point of view. Something like Titan, with 1:3, now you talk. Something like gaming cards with 1:12 or 1:16, or even 1:32, they just waste the silicon, and I could not understand the reason why they have DP at all  you would be faster if you implement a schoolgrade algorithm to use 3 SP to simulate one DP.
Why can't you do it with two? well, imagine that you will need 4 singledigit multiplications to do a 2digit multiplication (unless you use karatsuba, and you need 3, plus some additions). The trick is that you can not split one DP (FP64) into two SP (FP32). One SP has a sign bit, 8 bits of exponent, and 23+1 bits of fraction. One DP has a sign bit, 11 bits of exponent, and 52+1 bits of fraction. Therefore, putting two SP together, you may get only 48 bits of fraction, in spite of the fact that you have already 16 bits of exponent. So, you will need to use 3 SP to cover the range of 1 DP, but the things are not so simple, you will have a lot of headache with denormals and subnormals, etc. It is not like for integers where you just split them and multiply them. Here there is a lot of overhead. To do DP with HP (FP16) you waste more time with the overhead than with the multiplication effectively. It can be fun to try, but it will be extremely slow. 
20170803, 13:23  #6 
Feb 2016
UK
419 Posts 
Thanks for the responses. I really wished I paid more attention at school so maybe I didn't hit a math wall when I did.
Is it possible to look at it from the other direction. If you have a given precision level, can you compensate for it in other ways? I vaguely recall in old times x87 was mainly used. When lower precision x64 came along for general use, that was compensated for by using bigger FFT sizes at a given test case. Could FP32 be effectively used with bigger FFTs? Or is there some other fundamental limit in the rounding that prevents this from being used? I know bigger FFTs would take more calculation steps, but we would then be tapping into a faster FP32 rate to provide more of them. Apologies if I'm going over old ground. I hope those who actually know enough to do something useful with it would have considered this in the past. I just wish to expand my understanding, even if at a high level overview, why it can or can't take place. 
20170803, 13:31  #7 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6143_{10} Posts 

20170804, 18:33  #8 
Dec 2014
11111111_{2} Posts 
two level multiply
Using FP32 could, for example, build a 4096bit multiplier.
And then do the 70,000,000bit multiplies using the 4kb multiplier. This has probably been discussed on here also. 
20170805, 21:49  #9 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6143_{10} Posts 
That would be really inefficient. There are many such schemes we could employ to expand the size of the multiplier but they get progressively more inefficient for each layer you add into the process.

20170805, 22:12  #10  
∂^{2}ω=0
Sep 2002
República de California
11632_{10} Posts 
Quote:
Specialized hardware for that sort of thing is a significant niche sector of the microprocessor market, but for various reasons  price, widenessofuse, use of fixedpoint, etc  has not been the target of a GIMPS client. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
translating double to single precision?  ixfd64  Hardware  5  20120912 05:10 
Accuracy and Precision  davieddy  Math  0  20110314 22:54 
exclude single core from quad core cpu for gimps  jippie  Information & Answers  7  20091214 22:04 
so what GIMPS work can single precision do?  ixfd64  Hardware  21  20071016 03:32 
4 checkins in a single calendar month from a single computer  Gary Edstrom  Lounge  7  20030113 22:35 