![]() |
|
|
#1 |
|
May 2004
FRANCE
58010 Posts |
Hi All,
I just released llrp Version 3.8.0, a portable version of LLR 3.8.4 using FFTW. It uses a portable adaptation of George Woltman's gwnum library, but, at the contrary of llrpi which is stand alone, it uses Matteo Frigo and Steven G. Johnson's FFTW library, Version 3.2.2 The main advantage of using this library is that a power of two FFT length is no more required, which may sometimes gain a factor of two in speed! You may download the source and the x86 binaries from my development directory : http://jpenne.free.fr/Development/llrp380devsrc.zip is the compressed source directory. http://jpenne.free.fr/Development/llrp380dev.zip is the Windows (console) executable. http://jpenne.free.fr/Development/llrp380devlinux.zip is the Linux one. http://jpenne.free.fr/Development/llrp380devslinux.zip is the static Linux one. Indeed, this program is much slower than LLR 3.8.4 when run on x86 platforms, but be not disappointed! George's assembler code is so optimized that I think it could hardly be surpassed on these machines... But I think that this work can help for several other tasks : - It allows to verify positive results on non-x86 machines. - It would not be too difficult (I hope) to implement it on new hardwares such as GPU's, and I think it could help CUDA developpers. Best Regards, Jean |
|
|
|
|
|
#2 |
|
"Mark"
Apr 2003
Between here and the
143228 Posts |
How difficult would it be to modify to use YEAFFT? YEAFFT is faster than FFTW.
|
|
|
|
|
|
#3 |
|
May 2004
FRANCE
11048 Posts |
If I can pass an array of doubles to this library, do a real or complex FFT transform on it, get the result in one or two array(s) of doubles... If I can do the same things for the inverse FFT transforms, it would not be difficult to use YEAFFT.
Unfortunately it seems to me that it is not so easy... Am I wrong ? Jean |
|
|
|
|
|
#4 | |
|
"Mark"
Apr 2003
Between here and the
2×32×353 Posts |
Quote:
|
|
|
|
|
|
|
#5 | |
|
May 2004
FRANCE
22×5×29 Posts |
Quote:
AFAIK, YEAFFT is not really an FFT library, but rather, a part of Glucas internals... It does essentially convolutions products of arrays of doubles, using real direct and inverse FFT transforms, and benefits of IBDWT only in the case of squarings (multiplications ?) modulo a Mersenne number. In short, it is a portable version of George Woltman's gwnum library versions written before year 2004, without the restriction on base two numbers, but with multiplications intricated in its code... What I need is a library that do real and complex direct and inverse FFT transforms, and let me do the dyadic multiplications! Mark, Phil, let me know if I am wrong! Jean |
|
|
|
|
|
|
#6 | |
|
"Mark"
Apr 2003
Between here and the
2×32×353 Posts |
Quote:
Unfortunately it has not been kept up to date. I don't know what happened to Guillermo. I presume that he lost interest in it years ago. It does support multiplications in addition to squarings. AFAIU, it does not support any complex FFTs. If it could do some of these things (and a few more), I would have attempted to build PFGW with it. Last fiddled with by rogue on 2011-05-09 at 12:31 |
|
|
|
|
|
|
#7 | |
|
May 2004
FRANCE
22×5×29 Posts |
Quote:
Jean |
|
|
|
|
|
|
#8 |
|
Mar 2007
Austria
2·151 Posts |
Bug: segfaults when testing 1525*2^700391-1, before starting LL loop.
Seems to fail for every large-k/large-n combination. Nugget
|
|
|
|
|
|
#9 | |
|
May 2004
FRANCE
58010 Posts |
Quote:
I tested your candidate on my Linux Kubuntu system and had no problem : ~/llrp380devsrc/linuxllr $ ./llrp -a2 -oVerbose=1 -d -q"1525*2^700391-1" Starting Lucas Lehmer Riesel prime test of 1525*2^700391-1 Using zero-padded rational base DWT, FFT length = 86016 V1 = 4 ; Computing U0...done. 1525*2^700391-1 is prime! Time : 10193.227 sec. I am running the same test on Windows 2000 and got no problems until now, but it is three times slower, so only 27% done at this time... Now completed, an no more problem : I:\D\Jean\llrp380dev>llrp -a4 -oVerbose=1 -d -q"1525*2^700391-1" Starting Lucas Lehmer Riesel prime test of 1525*2^700391-1 Using zero-padded rational base DWT, FFT length = 86016 V1 = 4 ; Computing U0...done. 1525*2^700391-1 is prime! Time : 32613.687 sec. Regards, Jean |
|
|
|
|
|
|
#10 | |
|
Jul 2009
Tokyo
11428 Posts |
Quote:
|
|
|
|
|
|
|
#11 |
|
107 Posts |
A couple of suggestions regarding performance:
1) Try the r2c interface rather than the r2r interface for real-input FFTs. The r2c interface usually gives better performance. In particular, r2c exploits SIMD instructions (e.g. SSE2) whereas r2r does not. (For example, on my x86-64 Xeon for a size-16k transform, r2c is about 60% faster than r2r. I'm assuming you've compiled FFTW with --enable-sse2. The difference gets even larger with FFTW 3.3 on new machines with AVX instructions.) 2) Try FFTW_PATIENT rather than FFTW_MEASURE. For smallish transforms, it doesn't make much difference, but for > 32k it can make a difference, and may be worth it if you are doing lots of of transforms of the same size. Regards, Steven G. Johnson |
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Question about FFTW vs. gwnum? | hangxanh | Software | 2 | 2014-10-24 15:48 |
| FFTW vs. gwnum? | ixfd64 | Software | 7 | 2011-09-20 16:05 |
| llrpi : a full portable version of LLR | Jean Penné | Software | 8 | 2011-01-29 21:44 |
| idea: a portable version of Prime95 | ixfd64 | Software | 3 | 2010-10-07 14:18 |
| Prime95 on Playstation Portable | jasong | Software | 4 | 2007-03-12 07:15 |