mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-07-18, 21:07   #45
Mysticial
 
Mysticial's Avatar
 
Sep 2016

5308 Posts
Default

Worth mentioning that using balanced representation cancellation puts you squarely into the "works for average case, but not all cases" category. It is no longer 100% reliable because it's always possible to construct an input that does not destructively cancel out.

For library writers, this is considered a taboo. For more specialized applications like LL where you know you're unlikely to hit those pathological inputs, it's fine. Throw some checksums over it, and it can be made at least as reliable as the hardware itself even if mathematically it isn't 100% provably correct.

But even without balanced representation, FFTs still destroy NTTs for smaller products that fit into cache in terms of pure speed.

Last fiddled with by Mysticial on 2021-07-18 at 21:08
Mysticial is offline   Reply With Quote
Old 2021-07-18, 21:54   #46
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

101101100011012 Posts
Default

Quote:
Originally Posted by Mysticial View Post
Worth mentioning that using balanced representation cancellation puts you squarely into the "works for average case, but not all cases" category. It is no longer 100% reliable because it's always possible to construct an input that does not destructively cancel out.
One of my college thermodynamics professors used the following example to illustrate some of the counterintuitive aspects of statistical thermodynamics. He was standing in front of a blackboard and holding a piece of chalk, dropped the chalk and watched it shatter into pieces on the floor, then said:

"According to statistical thermodynamics, there is a nonzero probability that that piece of chalk will spontaneously reassemble itself and leap back up into my hand."

But instead of waiting for such to occur, he went to the blacboard and grabbed a fresh piece of chalk with with to finish the day's lecture.

Our present use case is not nearly so extreme, but as you note, as long as we have reliable error-checking mechanisms, we need not worry about low-probability pathological inputs, we simply put in place procedures to deal with them as they come. Hit a dangerously high roundoff or Gerbicz-check error? Rerun from last checkpoint file and see if it recurs. If so, take the last-checkpoint residue and randomly rotate it, or rerun at slightly larger FFT length, or what-have-you.
ewmayer is offline   Reply With Quote
Old 2021-07-19, 15:44   #47
moytrage
 
Jul 2021

3·11 Posts
Default

Right now trying to implement balanced representation in my FFT code. Still made some bug that I can't catch out, searching for it.

But already can tell in advance that there is not much point in doing this improvement inside my code. Because I'm not planning to opensource my library, as it was only a part of learning process.

Already what is necessary was answered for my self by people on this thread. For example I'm pretty sure that after implementing this balanced representation I can significantly rise number of used bits from 4 bits per word to desired 17-19 bits. And this rise of bits definitely will give great speedup to my FFT. So thanks for pointing out balanced representation.

After this speedup finally my FFT will become about the same speed as my NTT. I'll write back if I manage to implement correctly balanced representation and catch current bug.
moytrage is offline   Reply With Quote
Old 2021-07-20, 08:01   #48
moytrage
 
Jul 2021

3·11 Posts
Default

I observe very strange behavior.

After implementing balanced representation, for some reason I observe following behavior:

For example for 2^18 bytes input numbers there is wrong multiplication result if I use 15 and less bits per word, although observed error is very small. Then for 16 17 18 19 bits per word I get correct result and for more than 19 bits I get wrong result again.

So for mid-range of bits per word I get correct result and for smaller and larger bits I get wrong result.

For large bits I can understand that there might be a large error. By for small-range values I don't understand why its happenning.

It might be a bug in my code, that I can't find where it is. Or some other reason that I can't explain why. If it is a bug then I can't explain why I get correct results for part of vector - for example there are 1000 elements in total and I get 230 elements in the beginning of vector with correct value, and 231th value is incorrect.

Also if it is a bug then it will be unexplainable why for 16 17 18 19 bits I get correct answer. Strange that some bug might be not critical bug, that in some cases gives correct result.
moytrage is offline   Reply With Quote
Old 2021-07-20, 12:42   #49
axn
 
axn's Avatar
 
Jun 2003

3·17·101 Posts
Default

One thing I could think of: When you balance the most significant word, you might have to carry out a 1 into the next word. Whether or not this needs to be done could change based on the bits per word.

There might be other corner cases related to negative numbers as well.
axn is online now   Reply With Quote
Old 2021-07-20, 18:06   #50
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by moytrage View Post
Number 2^29 is not 2^17 words each 12 bits, but 2^29/12=2^25.4 words of 12 bits each.
Whoops, yes I thought the transform size I was getting was too small.

The other thing to watch out for with balanced representation is that it need not be unique. In theory with B bits per word all of your array entries x_i should have 2^(B-1) <= x_i < 2^(B-1) , but your carry propagation can subtract 1 from the next word in the array and add 2^B to x_i, or add 1 and subtract 2^B from x_i. This leads to array entries that may be too large or too small but can still represent a multiple-precision modulus exactly. But the carry propagation must be smart enough to expect extra work converting from balanced representation, and the final carry out of the leading array entry has to be handled carefully because it participates in the modulo operation.

As the LL test performs its final iterations the last few residues have long strings of largest or smallest possible x_i and this increases the odds of a bug or an overflow spoiling the computation

Last fiddled with by jasonp on 2021-07-20 at 18:10
jasonp is offline   Reply With Quote
Old 2021-07-21, 05:55   #51
moytrage
 
Jul 2021

418 Posts
Default

Quote:
Originally Posted by axn View Post
One thing I could think of: When you balance the most significant word, you might have to carry out a 1 into the next word.
Yes, I do this already. From the highest part I propagate carry with balancing algo untill carry is 0.

The strange thing is that I get following picture: for example for 2^18-bytes large numbers for 14 bit words I get first 10% of answer correctly, then next word is errored, significantly errored. For 15 bit words there first 30% correct and then error for 16 bits there first 60% correct and then error. And for 17 bits everything is correct. For 18, 19 bits also all is correct. And then for 20 bits I get errors due to overflow of 0.5 error limit, this case is correct case of the error.

All numbers and percents above I took approximately, I don't remember by heart what where exact numbers but they looked approximately like those.

Last fiddled with by moytrage on 2021-07-21 at 06:03
moytrage is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRP on gpu is faster that on cpu indomit Information & Answers 4 2020-10-07 10:50
faster than LL? paulunderwood Miscellaneous Math 13 2016-08-02 00:05
My CPU is getting faster and faster ;-) lidocorc Software 2 2008-11-08 09:26
Faster way to do LLT? 1260 Miscellaneous Math 23 2005-09-04 07:12
Faster than LL? clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 08:22.


Tue Oct 26 08:22:01 UTC 2021 up 95 days, 2:51, 0 users, load averages: 1.85, 1.96, 1.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.