20110508, 13:15  #1 
May 2004
FRANCE
562_{10} Posts 
llrp, a portable version of LLR using FFTW
Hi All,
I just released llrp Version 3.8.0, a portable version of LLR 3.8.4 using FFTW. It uses a portable adaptation of George Woltman's gwnum library, but, at the contrary of llrpi which is stand alone, it uses Matteo Frigo and Steven G. Johnson's FFTW library, Version 3.2.2 The main advantage of using this library is that a power of two FFT length is no more required, which may sometimes gain a factor of two in speed! You may download the source and the x86 binaries from my development directory : http://jpenne.free.fr/Development/llrp380devsrc.zip is the compressed source directory. http://jpenne.free.fr/Development/llrp380dev.zip is the Windows (console) executable. http://jpenne.free.fr/Development/llrp380devlinux.zip is the Linux one. http://jpenne.free.fr/Development/llrp380devslinux.zip is the static Linux one. Indeed, this program is much slower than LLR 3.8.4 when run on x86 platforms, but be not disappointed! George's assembler code is so optimized that I think it could hardly be surpassed on these machines... But I think that this work can help for several other tasks :  It allows to verify positive results on nonx86 machines.  It would not be too difficult (I hope) to implement it on new hardwares such as GPU's, and I think it could help CUDA developpers. Best Regards, Jean 
20110508, 15:48  #2 
"Mark"
Apr 2003
Between here and the
2×3,023 Posts 
How difficult would it be to modify to use YEAFFT? YEAFFT is faster than FFTW.

20110508, 19:12  #3 
May 2004
FRANCE
2×281 Posts 
YEAFFT usage for LLR
If I can pass an array of doubles to this library, do a real or complex FFT transform on it, get the result in one or two array(s) of doubles... If I can do the same things for the inverse FFT transforms, it would not be difficult to use YEAFFT.
Unfortunately it seems to me that it is not so easy... Am I wrong ? Jean 
20110508, 20:16  #4  
"Mark"
Apr 2003
Between here and the
2·3,023 Posts 
Quote:


20110509, 06:33  #5  
May 2004
FRANCE
2·281 Posts 
Quote:
AFAIK, YEAFFT is not really an FFT library, but rather, a part of Glucas internals... It does essentially convolutions products of arrays of doubles, using real direct and inverse FFT transforms, and benefits of IBDWT only in the case of squarings (multiplications ?) modulo a Mersenne number. In short, it is a portable version of George Woltman's gwnum library versions written before year 2004, without the restriction on base two numbers, but with multiplications intricated in its code... What I need is a library that do real and complex direct and inverse FFT transforms, and let me do the dyadic multiplications! Mark, Phil, let me know if I am wrong! Jean 

20110509, 12:30  #6  
"Mark"
Apr 2003
Between here and the
2·3,023 Posts 
Quote:
Unfortunately it has not been kept up to date. I don't know what happened to Guillermo. I presume that he lost interest in it years ago. It does support multiplications in addition to squarings. AFAIU, it does not support any complex FFTs. If it could do some of these things (and a few more), I would have attempted to build PFGW with it. Last fiddled with by rogue on 20110509 at 12:31 

20110509, 12:58  #7  
May 2004
FRANCE
1062_{8} Posts 
Quote:
Jean 

20110516, 18:52  #8 
Mar 2007
Austria
2×151 Posts 
Bug: segfaults when testing 1525*2^7003911, before starting LL loop.
Seems to fail for every largek/largen combination. Nugget 
20110518, 05:21  #9  
May 2004
FRANCE
2×281 Posts 
Quote:
I tested your candidate on my Linux Kubuntu system and had no problem : ~/llrp380devsrc/linuxllr $ ./llrp a2 oVerbose=1 d q"1525*2^7003911" Starting Lucas Lehmer Riesel prime test of 1525*2^7003911 Using zeropadded rational base DWT, FFT length = 86016 V1 = 4 ; Computing U0...done. 1525*2^7003911 is prime! Time : 10193.227 sec. I am running the same test on Windows 2000 and got no problems until now, but it is three times slower, so only 27% done at this time... Now completed, an no more problem : I:\D\Jean\llrp380dev>llrp a4 oVerbose=1 d q"1525*2^7003911" Starting Lucas Lehmer Riesel prime test of 1525*2^7003911 Using zeropadded rational base DWT, FFT length = 86016 V1 = 4 ; Computing U0...done. 1525*2^7003911 is prime! Time : 32613.687 sec. Regards, Jean 

20110630, 08:53  #10  
Jul 2009
Tokyo
2·5·61 Posts 
Quote:


20110701, 23:32  #11 
20273_{8} Posts 
performance comments
A couple of suggestions regarding performance:
1) Try the r2c interface rather than the r2r interface for realinput FFTs. The r2c interface usually gives better performance. In particular, r2c exploits SIMD instructions (e.g. SSE2) whereas r2r does not. (For example, on my x8664 Xeon for a size16k transform, r2c is about 60% faster than r2r. I'm assuming you've compiled FFTW with enablesse2. The difference gets even larger with FFTW 3.3 on new machines with AVX instructions.) 2) Try FFTW_PATIENT rather than FFTW_MEASURE. For smallish transforms, it doesn't make much difference, but for > 32k it can make a difference, and may be worth it if you are doing lots of of transforms of the same size. Regards, Steven G. Johnson 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Question about FFTW vs. gwnum?  hangxanh  Software  2  20141024 15:48 
FFTW vs. gwnum?  ixfd64  Software  7  20110920 16:05 
llrpi : a full portable version of LLR  Jean Penné  Software  8  20110129 21:44 
idea: a portable version of Prime95  ixfd64  Software  3  20101007 14:18 
Prime95 on Playstation Portable  jasong  Software  4  20070312 07:15 