![]() |
|
|
#23 |
|
Tribal Bullet
Oct 2004
356510 Posts |
I would think you could get useful work out of maybe a 2^22 root of two, not all the way to 2^28 like that thread used; there's also the option to use several (maybe 2-8) different moduli with CRT reconstruction, and that would be plenty to run LL all the way up to the current wavefront.
I can also provide non-power-of-two transforms that use the Winograd algorithm. How can we go about finding non-power-of-two roots of 2 by e.g. augmenting the iterated Tonelli-Shanks approach? Would that multiply the number of such roots available? No need to contemplate billion-digit LL tests when consumer GPUs can't even fit one residue right now. Last fiddled with by jasonp on 2018-06-05 at 17:39 |
|
|
|
|
|
#24 | |
|
Sep 2016
22×5×19 Posts |
Quote:
If you increase the search space to transform lengths with many non-power-of-two factors, then it could give enough dimensions of freedom to find more suitable primes. But I don't know how much overhead Winograd's algorithm will add compared to emulating 64-bit arithmetic, or even double-single precision float FFT. |
|
|
|
|
|
|
#25 |
|
Tribal Bullet
Oct 2004
356510 Posts |
I searched through the 30-bit primes and found quite a few that would be suitable for a NTT; all of the following have a root of two whose order is a large power of two and matches the order of the largest power-of-two root of unity modulo the prime:
Code:
p 22600001 primroot 04c8bae8 factors 2^21 5 5 11 dwt-root 008f1b9d order 2^21 p 22b00001 primroot 1889755b factors 2^20 3 5 37 dwt-root 20cf3916 order 2^20 p 23800001 primroot 158a82f1 factors 2^23 71 dwt-root 1efa1d96 order 2^23 p 23a00001 primroot 180eecaf factors 2^21 3 5 19 dwt-root 1e675b3f order 2^21 p 24100001 primroot 0e9595e6 factors 2^20 577 dwt-root 07c59f6d order 2^20 p 25e00001 primroot 01139d1b factors 2^21 3 101 dwt-root 00c2ad24 order 2^21 p 26200001 primroot 16692d18 factors 2^21 5 61 dwt-root 08b905b9 order 2^21 p 26800001 primroot 0d5e196d factors 2^23 7 11 dwt-root 1a3aa741 order 2^23 p 26a00001 primroot 234fdf92 factors 2^21 3 103 dwt-root 1a935819 order 2^21 p 27100001 primroot 0ab4f058 factors 2^20 5 5 5 5 dwt-root 106260cf order 2^20 p 27c00001 primroot 08c83b6c factors 2^22 3 53 dwt-root 035b3c19 order 2^22 p 28c00001 primroot 0e1f4c18 factors 2^22 163 dwt-root 1d204c71 order 2^22 p 2a600001 primroot 1f45b95c factors 2^21 3 113 dwt-root 0f3eefee order 2^21 p 2aa00001 primroot 15362e51 factors 2^21 11 31 dwt-root 177ad8d5 order 2^21 p 2ad00001 primroot 0f069c74 factors 2^20 5 137 dwt-root 1b4d3d3e order 2^20 p 2c200001 primroot 0e28187e factors 2^21 353 dwt-root 0a5266ac order 2^21 p 2c700001 primroot 0a361d1d factors 2^20 3 3 79 dwt-root 25d706dd order 2^20 p 2d000001 primroot 2c18f042 factors 2^24 3 3 5 dwt-root 0a489736 order 2^24 p 2df00001 primroot 01938851 factors 2^20 3 5 7 7 dwt-root 0971c5c8 order 2^20 p 2ee00001 primroot 0d9cd97a factors 2^21 3 5 5 5 dwt-root 04f4a141 order 2^21 p 2fa00001 primroot 14894b0f factors 2^21 3 127 dwt-root 2d0a271a order 2^21 p 2fb00001 primroot 23e1f07c factors 2^20 7 109 dwt-root 1004dbed order 2^20 p 2fd00001 primroot 06280117 factors 2^20 3 3 5 17 dwt-root 270d91f4 order 2^20 p 30d00001 primroot 040d960b factors 2^20 11 71 dwt-root 19b64592 order 2^20 p 31200001 primroot 07f3b74a factors 2^21 3 131 dwt-root 28303766 order 2^21 p 31b00001 primroot 03969c38 factors 2^20 3 5 53 dwt-root 017f4edd order 2^20 p 32b00001 primroot 32525f31 factors 2^20 811 dwt-root 1cb65e06 order 2^20 p 33700001 primroot 05eb3b37 factors 2^20 823 dwt-root 0e1ead7b order 2^20 p 34800001 primroot 05e26177 factors 2^23 3 5 7 dwt-root 26f88c2b order 2^23 p 34b00001 primroot 253b3b21 factors 2^20 3 281 dwt-root 0c040aec order 2^20 p 35800001 primroot 340422f0 factors 2^23 107 dwt-root 32d76a52 order 2^23 p 35a00001 primroot 070ff1f7 factors 2^21 3 11 13 dwt-root 114c150a order 2^21 p 36100001 primroot 34475648 factors 2^20 5 173 dwt-root 1985e108 order 2^20 p 36700001 primroot 328220c0 factors 2^20 13 67 dwt-root 2e3c5dfd order 2^20 p 36c00001 primroot 052f72f3 factors 2^22 3 73 dwt-root 14cbe41b order 2^22 p 36d00001 primroot 246eda9f factors 2^20 877 dwt-root 24803ef1 order 2^20 p 37200001 primroot 029f86ed factors 2^21 3 3 7 7 dwt-root 18b1410e order 2^21 p 37300001 primroot 35f25901 factors 2^20 883 dwt-root 297087ca order 2^20 p 37c00001 primroot 0525cf27 factors 2^22 223 dwt-root 227188ac order 2^22 p 37f00001 primroot 25897b94 factors 2^20 5 179 dwt-root 0fc2a1f0 order 2^20 p 38100001 primroot 1e1d33ed factors 2^20 3 13 23 dwt-root 1eee6a89 order 2^20 p 38400001 primroot 2cf8c37e factors 2^22 3 3 5 5 dwt-root 2ee15c0e order 2^22 p 38a00001 primroot 11060e57 factors 2^21 3 151 dwt-root 1493b30e order 2^21 p 39100001 primroot 348657d9 factors 2^20 11 83 dwt-root 18687f95 order 2^20 p 39600001 primroot 297695e1 factors 2^21 3 3 3 17 dwt-root 29bca8fc order 2^21 p 39f00001 primroot 10cd9c15 factors 2^20 3 3 103 dwt-root 0674a1fa order 2^20 p 3a200001 primroot 351d513e factors 2^21 3 5 31 dwt-root 0fea3e86 order 2^21 p 3a300001 primroot 27856dbe factors 2^20 7 7 19 dwt-root 208205f8 order 2^20 p 3ac00001 primroot 04c47e07 factors 2^22 5 47 dwt-root 372b0a54 order 2^22 p 3b800001 primroot 00e9a248 factors 2^23 7 17 dwt-root 25bcd018 order 2^23 p 3be00001 primroot 29e0eddc factors 2^21 479 dwt-root 29adf206 order 2^21 p 3c100001 primroot 1a135ebb factors 2^20 31 31 dwt-root 06af8b59 order 2^20 p 3c600001 primroot 281f5f45 factors 2^21 3 7 23 dwt-root 24d28be1 order 2^21 p 3e500001 primroot 022f5768 factors 2^20 997 dwt-root 3a935a41 order 2^20 p 3eb00001 primroot 1e94971a factors 2^20 17 59 dwt-root 0a1893a3 order 2^20 p 3ed00001 primroot 3a0abed6 factors 2^20 3 5 67 dwt-root 0d22fb97 order 2^20 |
|
|
|
|
|
#26 | |
|
Sep 2016
22×5×19 Posts |
Quote:
p = 0x23800001 = 595591169 w = 0x1efa1d96 = 519708054 519708054^(2^23) mod 595591169 = 451061951 Also, Wolfram Alpha returns no solution for: PowerMod[2, 1/4, 595591169] So it doesn't even have a 4th root-of-two - let alone a 2^23rd. Last fiddled with by Mysticial on 2018-06-11 at 20:10 |
|
|
|
|
|
|
#27 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
I see the problem, I wasn't checking whether the square root was solvable before computing it. Adding a Legendre check and printing in decimal now yields
Code:
p 536903681 primroot 179828132 factors 2^15 5 29 113 dwt-root 283856863 order 2^13 p 568406017 primroot 116384346 factors 2^12 3 3 17 907 dwt-root 176417532 order 2^11 p 590737409 primroot 527340004 factors 2^12 144223 dwt-root 197873606 order 2^11 p 613742593 primroot 81759997 factors 2^11 3 191 523 dwt-root 77279604 order 2^11 p 628021249 primroot 609313535 factors 2^11 3 102217 dwt-root 4276385 order 2^11 p 680058881 primroot 339951669 factors 2^13 5 16603 dwt-root 420691659 order 2^11 p 694614017 primroot 306447105 factors 2^11 17 71 281 dwt-root 239731871 order 2^11 p 705359873 primroot 366883740 factors 2^12 7 73 337 dwt-root 643622083 order 2^11 p 716531713 primroot 88617846 factors 2^11 3 13 8971 dwt-root 519462414 order 2^11 p 751599617 primroot 142635099 factors 2^15 22937 dwt-root 100418175 order 2^11 p 760197121 primroot 613923624 factors 2^12 3 5 12373 dwt-root 629602267 order 2^11 p 771739649 primroot 320084088 factors 2^12 29 73 89 dwt-root 59058360 order 2^12 p 854026241 primroot 174518929 factors 2^11 5 83401 dwt-root 349192115 order 2^11 p 913324033 primroot 577747733 factors 2^11 3 3 3 83 199 dwt-root 4870533 order 2^11 p 921782273 primroot 356301033 factors 2^11 31 14519 dwt-root 272023053 order 2^11 p 940613633 primroot 65730855 factors 2^13 7 47 349 dwt-root 592028551 order 2^13 p 1009606657 primroot 570890730 factors 2^13 3 41081 dwt-root 156965663 order 2^11 p 1011167233 primroot 494276226 factors 2^12 3 19 61 71 dwt-root 533960189 order 2^11 p 1017788417 primroot 420693678 factors 2^11 61 8147 dwt-root 201343990 order 2^11 p 1035491329 primroot 980936777 factors 2^11 3 3 56179 dwt-root 176753051 order 2^11 p 1052508161 primroot 927474728 factors 2^18 5 11 73 dwt-root 763761318 order 2^11 Last fiddled with by jasonp on 2018-06-11 at 21:03 |
|
|
|
|
|
#28 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
It looks like VOLTA has separated the INT32 and FP32 cores and can use both at once. Would a useful joint Integer/float fft be possible with these low precision types? Some FP64 could be used for added precision in places although consumer cards will be heavily limited there. https://devblogs.nvidia.com/inside-volta/
|
|
|
|
|
|
#29 | |
|
"Mihai Preda"
Apr 2015
145210 Posts |
Quote:
SP & DP FFT has the benefit of being handled by cuFFT; (but not any NTT). SP FFT at 4M+ FFT size, IMO, does not work at all. (Too much loss of precision in the SP FFT twiddles). So that leaves DP + hand-rolled-NTT, which might be worth but a lot of work. BTW, which VOLTA GPU would you suggest buying? |
|
|
|
|
|
|
#30 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
Quote:
Is it not possibly to interleave DP and SP to reduce the errors without over-saturating the DP? |
|
|
|
|
|
|
#31 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11110100100002 Posts |
Quote:
|
|
|
|
|
|
|
#32 | |
|
"Mihai Preda"
Apr 2015
145210 Posts |
Quote:
The "bits per word" would be: - M31 alone: about 5 bits/w. - DP + M31: about 18 + 15.5 bits/w. Last fiddled with by preda on 2018-06-29 at 22:16 |
|
|
|
|
|
|
#33 | |
|
"Mihai Preda"
Apr 2015
101101011002 Posts |
Quote:
Maybe one could try a "double SP" FFT (using two SP instead of one DP), but most likely that'd be slower than DP (and a lot more complicated). Last fiddled with by preda on 2018-06-29 at 22:22 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 128 bit integer division in CUDA? | cseizert | GPU Computing | 8 | 2016-11-27 15:41 |
| Non-power-of-two FFTs | jasonp | Computer Science & Computational Number Theory | 15 | 2014-06-10 14:49 |
| P95 PrimeNet causes BSOD; small FFTs, large FFTs, and blend test don't | KarateF22 | PrimeNet | 16 | 2013-10-28 00:34 |
| In Place Large FFTs Failure | nwb | Information & Answers | 2 | 2011-07-08 16:04 |
| gmp-ecm and FFTs | dave_dm | Factoring | 9 | 2004-09-04 11:47 |