mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2010-05-12, 06:39   #1
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

59710 Posts
Default LLR Version 3.8.1 is now available!

Hi All,

I uploaded yesterday the binaries and source of the new version 3.8.1 of LLR.
For now, it is a development version, and the zip files are in my development directory :

http://jpenne.free.fr/Development/

I will release it at the "official" version as soon as it seems to be stable enough ; I hope it will be very soon...

I made the source available because I cannot build the MacIntel binary by myself. I would be grateful if someone would send to me the result of the build...

- This version uses the most recent Version 25.14 of George Woltman's gwnum library.

- It allows now mutiple data formats in the input file, a feature which was wanted by LLRNET developpers.

- The PRP testing of (generalized) repunits has been implemented.

- The error checking and recovery has been made more rigorous (I hope...).

Happy prime number hunting, and Best Regards,

Jean
Jean Penné is offline   Reply With Quote
Old 2010-05-12, 07:21   #2
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

1011101000012 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
- It allows now mutiple data formats in the input file, a feature which was wanted by LLRNET developpers.
Thanks for this implementation! I will test this the next days
kar_bon is offline   Reply With Quote
Old 2010-05-12, 08:03   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32×52×11 Posts
Default

Jean,

Would it be difficult to produce (if you haven't done so already) a version of LLR based on FFTW? I ask not because anyone would want to use an FFTW version (too slow) but because the CUDA cufft libraries are based on the FFTW model. Starting from MacLucasFFTW, converting the FFTW calls to cufft, writing a normalization routine in CUDA to avoid having to transfer data off and back on the GPU for every iteration, plus a few clever tricks, msft has produced a fast LL testing program. It's limited to power-of-2 FFT's, but completes a LL iteration using 2048K FFTs in 10.6ms on a GTX 260 and 5.5ms on a GTX 480, and should be about 4x faster on the Tesla C20X0's. Similar performance should be possible for LLR.
frmky is offline   Reply With Quote
Old 2010-05-12, 12:22   #4
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×421 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Hi All,

I uploaded yesterday the binaries and source of the new version 3.8.1 of LLR.
For now, it is a development version, and the zip files are in my development directory :

http://jpenne.free.fr/Development/

I will release it at the "official" version as soon as it seems to be stable enough ; I hope it will be very soon...

I made the source available because I cannot build the MacIntel binary by myself. I would be grateful if someone would send to me the result of the build...

- This version uses the most recent Version 25.14 of George Woltman's gwnum library.

- It allows now mutiple data formats in the input file, a feature which was wanted by LLRNET developpers.

- The PRP testing of (generalized) repunits has been implemented.

- The error checking and recovery has been made more rigorous (I hope...).

Happy prime number hunting, and Best Regards,

Jean
Jean, if nobody else has volunteered, I can do this later.
rogue is offline   Reply With Quote
Old 2010-05-12, 13:57   #5
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

3×199 Posts
Default FFTW based LLR

Quote:
Originally Posted by frmky View Post
Jean,

Would it be difficult to produce (if you haven't done so already) a version of LLR based on FFTW? I ask not because anyone would want to use an FFTW version (too slow) but because the CUDA cufft libraries are based on the FFTW model. Starting from MacLucasFFTW, converting the FFTW calls to cufft, writing a normalization routine in CUDA to avoid having to transfer data off and back on the GPU for every iteration, plus a few clever tricks, msft has produced a fast LL testing program. It's limited to power-of-2 FFT's, but completes a LL iteration using 2048K FFTs in 10.6ms on a GTX 260 and 5.5ms on a GTX 480, and should be about 4x faster on the Tesla C20X0's. Similar performance should be possible for LLR.

Are you a seer ? I am developping a portable version of LLR based on FFTW!
Presently, I did'nt release it because :

- It works only on power-of-two FFT's
- I implemented only full IBDWT method (not yet zero padded one), so, accepted k values are at most 20 bits large...
- It is ~three times slower than gwnum based LLR.
- It yields sometimes unexplained false negative results (even although there are no excessive round-off errors...).
I did not know anything about CUDA cufft libraries, so, I am very interested !
Is this code faster than FFTW?
Is it available for non x86 machines?
Thank you for your interesting message, and best regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2010-05-12, 14:00   #6
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

3·199 Posts
Default

Quote:
Originally Posted by rogue View Post
Jean, if nobody else has volunteered, I can do this later.
Thank you by advance, Mark!
Jean Penné is offline   Reply With Quote
Old 2010-05-12, 14:27   #7
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12EB16 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Are you a seer ? I am developping a portable version of LLR based on FFTW!
Presently, I did'nt release it because :

- It works only on power-of-two FFT's
- I implemented only full IBDWT method (not yet zero padded one), so, accepted k values are at most 20 bits large...
- It is ~three times slower than gwnum based LLR.
- It yields sometimes unexplained false negative results (even although there are no excessive round-off errors...).
I did not know anything about CUDA cufft libraries, so, I am very interested !
Is this code faster than FFTW?
Is it available for non x86 machines?
Thank you for your interesting message, and best regards,
Jean
PG-00000-003_V1.1 1
NVIDIA
CUFFT Library
This document describes CUFFT, the NVIDIA® CUDA™ (compute
unified device architecture) Fast Fourier Transform (FFT) library. The
FFT is a divide‐and‐conquer algorithm for efficiently computing
discrete Fourier transforms of complex or real‐valued data sets, and it
is one of the most important and widely used numerical algorithms,
with applications that include computational physics and general
signal processing. The CUFFT library provides a simple interface for
computing parallel FFTs on an NVIDIA GPU, which allows users to
leverage the floating‐point power and parallelism of the GPU without
having to develop a custom, GPU‐based FFT implementation.
FFT libraries typically vary in terms of supported transform sizes and
data types. For example, some libraries only implement Radix‐2 FFTs,
restricting the transform size to a power of two, while other
implementations support arbitrary transform sizes. This version of the
CUFFT library supports the following features:
- 1D, 2D, and 3D transforms of complex and real‐valued data.
- Batch execution for doing multiple 1D transforms in parallel.
- 2D and 3D transform sizes in the range [2, 16384] in any
dimension.
- 1D transform sizes up to 8 million elements.
- In‐place and out‐of‐place transforms for real and complex data.

The CUFFT API is modeled after FFTW (see http://www.fftw.org),
which is one of the most popular and efficient CPU‐based FFT
libraries. FFTW provides a simple configuration mechanism called a
plan that completely specifies the optimal—that is, the minimum
#define CUFFT_FORWARD -1
#define CUFFT_INVERSE 1

floating‐point operation (flop)—plan of execution for a particular FFT
size and data type. The advantage of this approach is that once the
user creates a plan, the library stores whatever state is needed to
execute the plan multiple times without recalculation of the
configuration. The FFTW model works well for CUFFT because
different kinds of FFTs require different thread configurations and
GPU resources, and plans are a simple way to store and reuse
configurations.

The CUFFT library implements several FFT algorithms, each having
different performance and accuracy. The best performance paths
correspond to transform sizes that meet two criteria:

1. Fit in CUDAʹs shared memory
2. Are powers of a single factor (for example, powers of two)

These transforms are also the most accurate due to the numeric
stability of the chosen FFT algorithm. For transform sizes that meet the
first criterion but not second, CUFFT uses a more general mixed‐radix
FFT algorithm that is usually slower and less numerically accurate.
Therefore, if possible it is best to use sizes that are powers of two or
four, or powers of other small primes (such as, three, five, or seven). In
addition, the power‐of‐two FFT algorithm in CUFFT makes maximum
use of shared memory by blocking sub‐transforms for signals that do
not meet the first criterion.

http://developer.download.nvidia.com...ibrary_1.1.pdf

HTH

Luigi

Last fiddled with by ET_ on 2010-05-12 at 14:33
ET_ is offline   Reply With Quote
Old 2010-05-12, 19:10   #8
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

32×52×11 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Is this code faster than FFTW?
Is it available for non x86 machines?
It runs only on NVIDIA graphics cards supporting double precision operations, mainly the GTX 260 and higher. (Single precision FFTs are also supported, but are not optimal for FFT integer multiplication.) On these cards, it is as fast or faster (depending on the card) as gwnum is on the latest CPUs.

For example, on a recent test, one Prime95 thread required about 33ms/iteration for a 1028K FFT. The CUDA program, limited to power-of-2 FFT's, used a 2048K FFT for the same number (nearly twice as large!) but is still 3-6 times faster on current cards (10.6ms/iter on a GTX 260 and 5.5ms/iter on a GTX 480), and should be about 20 times faster on the Tesla C20X0's when they are released in a few months. To test the hardware on the new GTX 480 card we just got, I'm running a double check of M42643801 using a 4096K FFT, and it will take just over 5 days. A Tesla C20X0 should do that in less than 36 hours.

Last fiddled with by frmky on 2010-05-12 at 19:14
frmky is offline   Reply With Quote
Old 2010-05-12, 19:38   #9
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

59710 Posts
Default LLR on CUDA ??

Quote:
Originally Posted by frmky View Post
It runs only on NVIDIA graphics cards supporting double precision operations, mainly the GTX 260 and higher. (Single precision FFTs are also supported, but are not optimal for FFT integer multiplication.) On these cards, it is as fast or faster (depending on the card) as gwnum is on the latest CPUs.

For example, on a recent test, one Prime95 thread required about 33ms/iteration for a 1028K FFT. The CUDA program, limited to power-of-2 FFT's, used a 2048K FFT for the same number (nearly twice as large!) but is still 3-6 times faster on current cards (10.6ms/iter on a GTX 260 and 5.5ms/iter on a GTX 480), and should be about 20 times faster on the Tesla C20X0's when they are released in a few months. To test the hardware on the new GTX 480 card we just got, I'm running a double check of M42643801 using a 4096K FFT, and it will take just over 5 days. A Tesla C20X0 should do that in less than 36 hours.

Thank you ET and frmky for these infos!
I realize I was totally inexperienced about GPU usage for computing!
It seems to be greatly interesting for the future of fast primality testing, but unfortunately, I own neither the hardware nor the software to develop a CUDA program, so, how to do that?
Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2010-05-12, 21:30   #10
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

29·167 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Thank you ET and frmky for these infos!
I realize I was totally inexperienced about GPU usage for computing!
It seems to be greatly interesting for the future of fast primality testing, but unfortunately, I own neither the hardware nor the software to develop a CUDA program, so, how to do that?
Regards,
Jean
The software, as well as all the needed documentation, can be freely downloaded from the NVIDIA website.
There is also an "emulator" option to test software ehile not yet owing a GPU card.

As for the hardware... I guess that when new Tesla cards are released, older GTX 260 and 275 will get cheaper.

Luigi

Last fiddled with by ET_ on 2010-05-12 at 21:31
ET_ is offline   Reply With Quote
Old 2010-05-12, 22:58   #11
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

23×139 Posts
Default

Hi Jean,

Quote:
Originally Posted by ET_ View Post
There is also an "emulator" option to test software ehile not yet owing a GPU card.
AFAIK "device emulation" is not supported on CUDA 3.0 anymore.
It is singlethreaded (won't reveal race conditions) and runs very slow (at least for my code).

Oliver

Last fiddled with by TheJudger on 2010-05-12 at 22:58
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR Version 3.8.5 is available! Jean Penné Software 11 2011-02-20 18:22
LLR Version 3.8.0 is now available! Jean Penné Software 22 2010-04-28 07:45
Which version for P-III's? richs Software 41 2009-01-07 14:40
LLR - new version Cruelty Riesel Prime Search 8 2006-05-16 15:00
Which LLR version to use... Cruelty Riesel Prime Search 1 2005-11-10 15:17

All times are UTC. The time now is 13:35.


Mon Oct 3 13:35:53 UTC 2022 up 46 days, 11:04, 0 users, load averages: 1.33, 1.43, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔