![]() |
|
|
#12 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
You’re right - I was only accounting for one pass. Of course I’m not sure how all the constants translate into integer based calculations, but constant in hardware are significantly faster than constants in software/CUDA.
|
|
|
|
|
|
#13 |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
I'm personally very interested in how this turns out; I wish you all good luck.
I did try to implement a FGT (fast gallois transform) integer convolution in OpenCL/GCN, but somehow the performance was disappointing and I didn't investigate further. (I'm not sure what part of the blame goes to the compiler for generating poor integer code). Just mentioning, a radically different integer approach for convolution is "Nussbaumer convolution" described in Richard Crandall "Topics in Advanced Scientific Computation". This requires mostly integer additions! no multiplications, or constant tables. The drawback is in the memory access pattern, which is not fit for CPU or GPU architectures (low locality), but who knows may be OK for something different like FPGA. Another drawback is that the "Nussbaumer convolution" naturally computes a negacyclic convolution, instead of a cyclic convolution that we need for mersenne modulo. |
|
|
|
|
|
#15 | |
|
"Marv"
May 2009
near the Tannhäuser Gate
3·269 Posts |
Quote:
Just wondering if you have started this yet. I have an interest in what you propose. |
|
|
|
|
|
|
#16 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
Very basic stuff only; real life is swamping everything for the foreseeable future.
|
|
|
|
|
|
#17 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Quote:
On NVIDIA, there's already CUDALucas and gpulucas for LL in CUDA, nothing via OpenCl; nothing yet on NVIDIA for PRP in CUDA or OpenCl. Neither LL program has the Jacobi check, which would be a nice addition. Preda's contemplating addressing NVIDIA somehow with gpuOwL or a variant; avoiding duplication might be good. A summary of current software available is described at http://www.mersenneforum.org/showthread.php?t=22450 and http://www.mersenneforum.org/showthread.php?t=23371 |
|
|
|
|
|
|
#18 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
In my mind the only reason to attempt something like this would be to speed up enough consumer grade GPUs to measurably increase the throughput of GIMPS. I've been toying with number theoretic FFTs off and on for almost 20 years now and it would be nice to find a real niche for them in a project like this.
|
|
|
|
|
|
#19 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Quote:
|
|
|
|
|
|
|
#20 |
|
Tribal Bullet
Oct 2004
DED16 Posts |
There's some interesting reading on the subject here.
It's ironic that FFTs using Winograd's prime factor algorithm were developed in the 70s and 80s to optimally reduce the number of multiplications in FFTs, and suddenly after 30 years that really matters for integer FFTs where multiplies have lower throughput than anything else. A single modular multiply requires one high-half and two low-half integer multiplications when the prime modulus and the underlying arithmetic is chosen carefully. I don't know if the overhead of an integer FFT using 30-bit prime moduli would swamp the savings in throughput from avoiding double precision floating point; my gut says the integer option would be faster on lower-end cards. |
|
|
|
|
|
#21 |
|
"David"
Jul 2015
Ohio
20516 Posts |
I’m still interested in this for FPGAs
I happen to have ended up with exceptional access to a very large number of High end Xilinx FPGAs, and I would be more than happy to provide one or two on loan to anyone who has time or has a grad student to implement some GIMPS factoring or LL algorithms in them. |
|
|
|
|
|
#22 | |
|
Sep 2016
17C16 Posts |
Quote:
30-bit primes have enough roots-of-unity to do convolutions of around a billion bits, but you also need roots-of-two. And since there's no observable correlation between them, the longest transform length that you can do with 30-bit primes will probably only be around 2^16-ish. More context here with 64-bit primes: http://www.mersenneforum.org/showthr...t=unity&page=4 Last fiddled with by Mysticial on 2018-06-05 at 15:55 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 128 bit integer division in CUDA? | cseizert | GPU Computing | 8 | 2016-11-27 15:41 |
| Non-power-of-two FFTs | jasonp | Computer Science & Computational Number Theory | 15 | 2014-06-10 14:49 |
| P95 PrimeNet causes BSOD; small FFTs, large FFTs, and blend test don't | KarateF22 | PrimeNet | 16 | 2013-10-28 00:34 |
| In Place Large FFTs Failure | nwb | Information & Answers | 2 | 2011-07-08 16:04 |
| gmp-ecm and FFTs | dave_dm | Factoring | 9 | 2004-09-04 11:47 |